Energy-Efficient Transmission Scheduling with 
Strict Underflow Constraints 

David I Shuman, Mingyan Liu, and Owen Q. Wu 

Abstract 

We consider a single source transmitting data to one or more receivers/users over a shared wireless 
channel. Due to random fading, the wireless channel conditions vary with time and from user to user. 
Each user has a buffer to store received packets before they are drained. At each time step, the source 
determines how much power to use for transmission to each user The source's objective is to allocate 
power in a manner that minimizes an expected cost measure, while satisfying strict buffer underflow 
constraints and a total power constraint in each slot. The expected cost measure is composed of costs 
associated with power consumption from transmission and packet holding costs. The primary application 
motivating this problem is wireless media streaming. For this application, the buffer underflow constraints 
prevent the user buffers from emptying, so as to maintain playout quality. In the case of a single user 
with linear power-rate curves, we show that a modified base-stock policy is optimal under the finite 
horizon, infinite horizon discounted, and infinite horizon average expected cost criteria. For a single user 
with piecewise-linear convex power-rate curves, we show that a finite generalized base-stock policy is 
optimal under all three expected cost criteria. We also present the sequences of critical numbers that 
complete the characterization of the optimal control laws in each of these cases when some additional 
technical conditions are satisfied. We then analyze the structure of the optimal policy for the case of 
two users. We conclude with a discussion of methods to identify implementable near-optimal policies 
for the most general case of M users. 
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I. Introduction 

In this paper, we examine the problem of energy-efficient transmission scheduling over a 
wireless channel, subject to underflow constraints. We consider a single source transmitting to 
one or more receivers/users over a shared wireless channel. Each user has a buffer to store 
received packets before they are drained at a certain rate. The available data rate of the channel 
varies with time and from user to user, due to random fading. The transmitter's goal is to minimize 
total power consumption by exploiting the temporal and spatial variation of the channel, while 
preventing any user's buffer from emptying. 

A. Opportunistic Scheduling and Related Work 

This problem falls into the general class of opportunistic scheduling problems, where the 
common theme is to exploit the temporal and spatial variation of the channel}^ At a high level, 
the idea of exploiting the temporal diversity of the channel via opportunistic scheduling can be 
explained as follows. Consider the case of a single sender transmitting to a single receiver with 
different linear power-rate curves for each possible channel condition. Consider one scheduling 
policy that transmits data in a just-in-time fashion, without regard to the condition of the 
time-varying channel. Over the long run, the total power consumption tends toward the power 
consumption per data packet under the average channel condition times the number of packets 
sent. If instead, the scheduler aims to send more data when the channel is in a "good" state 
(requiring less power per data packet), and less data when the channel is in a "bad" state, the total 
power consumption should be lower. Much of the challenge for the scheduler lies in determining 
how good or bad a channel condition is, and how much data to send accordingly. 

Similarly, in the case of multiple receivers, the scheduler can exploit the spatial diversity of 
the channel by transmitting only to those receivers who have the best channel conditions in each 
time slot. The benefit of increasing system throughput and reducing total power consumption 
through such a joint resource allocation policy is commonly referred to as the multiuser diversity 

'Opportunistic scheduling problems are also referred to as multi-user variable channel scheduling problems [1]. 
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gain [2]. It was introduced in the context of the analogous uplink problem where multiple sources 
transmit to a single destination (e.g., the base station) [3]. Since, there has been a wide range 
of literature on opportunistic scheduUng problems in wireless networks. 

Sending more data when the channel is in a good state can increase system throughput 
and/or reduce total energy consumption; however, in opportunistic scheduUng problems, it is 
often the case that the transmission scheduler has competing quality of service (QoS) interests. 
For instance, one QoS interest commonly considered is fairness. If, when a singe source is 
transmitting to multiple receivers, the scheduler only considers total throughput and energy 
consumption across all users, it may often be the case that it ends up transmitting to a single 
user or the same small group of users in every slot. This can happen, for instance, if a base 
station requires less power to send data to a nearby receiver, even when the nearby receiver's 
channel is in its worst possible condition and a farther away receiver's channel is in its best 
possible condition. Thus, fairness constraints are often imposed to ensure that the transmitter 
sends packets to all receivers. 

A number of different fairness conditions have been examined in the literature. For example, 
[4] and [5] consider temporal fairness, where the scheduler must transmit to each receiver 
for some minimum fraction of the time over the long run. Under the proportional fairness 
considered by [2] and [6], the scheduler considers the current channel conditions relative to the 
average channel condition of each receiver. Reference [5] considers a more general utilitarian 
fairness, where the focus is on system performance from the receiver's perspective, rather than 
on resources consumed by each user. The authors of [7] incorporate fairness directly into the 
objective function by setting relative throughput target values for each receiver and maximizing 
the minimum relative long-run average throughput. 

Another QoS consideration that is important in many applications is delay. Different notions of 
delay have been incorporated into opportunistic scheduling problems. One proxy for delay is the 
stability of all of the sender's queues for arriving packets awaiting transmission. The motivation 
for this criterion is that if none of these queues blows up, then the delay is not "too bad." With 
stability as an objective, it is common to restrict attention to throughput optimal policies, which 
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are scheduling policies that ensure the sender's queues are stable, as long as this is possible 
for the given arrival process and channel model. References [8]-[ll] present such throughput 
optimal scheduling algorithms, and examine conditions guaranteeing stabilizability in different 
settings. 

When an arriving packet model is used for the data, one can also define end-to-end delay as the 
time between a packet's arrival at the sender's buffer and its decoding by the receiver. A number 
of opportunistic scheduling studies have considered the average end-to-end delay of all packets 
over a long horizon. For instance, [12]-[23] all consider average delay, either as a constraint or by 
incorporating it directly into the objective function to be minimized. However, the average delay 
criterion allows for the possibiUty of long delays (albeit with small probability); thus, for many 
delay-sensitive applications, strict end-to-end delay is often a more appropriate consideration 
for studies with arriving packet models. In [24] and [25], Chen, Mitra, and Neely place strict 
constraints on the end-to-end delay of each packet in a point-to-point system, examine the optimal 
scheduling policy assuming all future channel conditions are known, and suggest heuristics based 
on this optimal offline scheduling policy for the more realistic online case where the scheduler 
only learns the channel conditions in a causal fashion. Rajan, Sabharwal, and Aazhang also 
consider strict constraints on the end-to-end delay in an arriving packet model in [16, Section 
IV]. 

A strict constraint on the end-to-end delay of each packet is one particular form of a deadline 
constraint, as each packet has a deadline by which it must be transmitted. This notion can be 
generalized to impose individual deadlines on each packet, whether the packets are arriving over 
time or are all in the sender's buffer from the beginning. References [26]-[31] consider point-to- 
point communication when a fixed amount of data is in the sender's buffer at the start of the time 
horizon and the individual deadlines coincide, so that all packets must be transmitted and received 
by a common deadline, the end of the time horizon under consideration. In [26, Section ni-D] 
and [27, Section III-D], Fu, Modiano, and Tsitsiklis specify the optimal transmission policy when 
the power-rate curves under each channel condition are linear and the transmitter is subject to a 
per slot peak power constraint. In [28]-[31], Lee and Jindal model the power-rate curve under 
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each channel condition as convex, first of the form of the so-called Shannon cost function based 
on the capacity of the additive white Gaussian noise channel, and then as a convex monomial 
function 

References [32] and [33] consider opportunistic scheduling problems with multiple receivers 
and a single deadline constraint at the end of a finite horizon. Packets arrive over time and the 
emphasis is on offline scheduling policies in [32], whereas [33] considers a fixed amount of 
data destined for each receiver, and assumes the data is already in the sender's buffers at the 
beginning of the horizon. The model of [33] is perhaps the closest to our general model for M 
receivers; however, two key differences are (i) the transmitter is not subject to a power constraint 
in [33]; and (ii) the transmitter can transmit to at most one receiver in each time slot in [33]. 

In our model, the strict underflow constraints serve as a notion of both fairness and delay. 
The notion of fairness is that none of the receivers' buffers are allowed to empty, guaranteeing 
the required level of service to all users. The underflow constraints also serve as a notion of 
delay, and can be seen as multiple deadline constraints - certain packets must arrive by the end 
of the first slot, another group by the end of the second slot, and so forth. Therefore, Sections 



III and rV of this paper aim to generalize the works of [26]-[27] and [28]-[31], respectively, 
by considering multiple deadlines in the point-to-point communication problem, rather than a 
single deadline at the end of the horizon. In addition to better representing some delay-sensitive 
applications, this extension of the model also allows us to consider infinite horizon problems. 
We compare related work in opportunistic scheduling problems with deadline constraints further 
in [34]. For more complete surveys of opportunistic scheduling studies in wireless networks, see 
[35] and [36]. 

B. Wireless Media Streaming and Related Work 

The primary apphcation we have in mind to motivate this problem is wireless media streaming. 
For this application, the data are audio/video sequences, and the packets are drained from the 



II-A 



these two cases correspond to power-rate curves of the form c{z, s) = ^ ,} and c(z, s) 



In our notation of Section 

-^j^jyj, respectively, where c(z, s) is the power required to transmit z bits under channel condition s, pi(-) and g2{-) are known 
functions, and ^ is a fixed parameter. 
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receivers' buffers in order to be decoded and played. Enforcing the underflow constraints reduces 
playout interruptions to the end users. In order to make the presentation concrete, we use the 
above wireless media streaming terminology throughout the paper. 

Transporting multimedia over wireless networks is a promising application that has seen recent 
advances [37]. At the same time, a number of resource allocation issues need to be addressed 
in order to provide high quality and efficient media over wireless. First, streaming is in general 
bandwidth-demanding. Second, streaming applications tend to have stringent QoS requirements 
(e.g., they can be delay and jitter intolerant). Third, it is desirable to operate the wireless system 
in an energy-efficient manner. This is obvious when the source of the media streaming (the 
sender) is a mobile. When the media comes from a base station that is not power-constrained, 
it is still desirable to conserve power in order to (i) limit potential interference to other base 
stations and their associated mobiles, and (ii) maximize the number of receivers the sender can 
support. 

Of the related work in wireless media streaming, [38] has the closest setup to our model. 
The main differences are that [38] features a loose constraint on underflow (i.e., it is allowed, 
but at a cost), as opposed to our tight constraint, and the two studies adopt different wireless 
channel models. In the extension [39], the receiver may slow down its playout rate (at some 
cost) to avoid underflow. In this setting, the authors investigate the tradeoffs between power 
consumption and playout quality, and examine joint power/playout rate control policies. In our 
model, the receiver does not have the option to adjust the playout speeds. Our model also bears 
resemblance to [40]. The first difference here is that [40] aims to minimize transmission energy 
subject to a constant end-to-end delay constraint on each video frame. A second difference is 
that the controller in [40] must assign various source coding parameters such as quantization 
step size and coding mode, whereas our model assumes a fixed encoding/decoding scheme. 

C. Summary of Contribution 

In this paper, we formulate the task of energy-efficient transmission scheduling subject to 
strict underflow constraints as three different Markov decision problems (MDPs), with the finite 
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horizon discounted expected cost, infinite horizon discounted expected cost, and infinite horizon 
average expected cost criteria, respectively. These three MDPs feature a continuous component 
of the state space and a continuous action space at each state. Therefore, unlike finite MDPs, they 
cannot in general be solved exactly via dynamic programming, and suffer from the well-known 
curse of dimensionality [41], [42]. Our aim in this paper is to analyze the dynamic programming 
equations in order to (i) determine if there are circumstances under which we can analytically 
derive optimal solutions to the three problems; and (ii) leverage our mathematical analysis and 
results on the structures of the optimal scheduling policies to improve our intuitive understanding 
of the problems. 

We begin by showing that in the case of a single receiver under linear power-rate curves, 
the optimal policy is an easily-implementable modified base-stock policy. In each time slot, it is 
optimal for the sender to transmit so as to bring the number of packets in the receiver's buffer 
level after transmission as close as possible to a target level or critical number]^ The target level 
depends on the current channel condition, with a better channel condition corresponding to a 
higher target level. We also show that the strict underflow constraints may cause the scheduler 
to be less opportunistic than it otherwise would be, and transmit more packets under "medium" 
channel conditions in anticipation of deadline constraints in future time slots. 

We then generalize this result in two different directions. First, we relax the assumption that the 
power-rate curves under each channel condition are linear, and model them as piecewise-linear 
convex to better approximate more realistic convex power-rate curves. Under piecewise-linear 
power-rate curves, we show the optimal policy is a finite generalized base-stock policy, and 
provide an intuitive explanation of this structure in terms of multiple target levels in each time 
slot. In addition to the structural results on the optimal policy for the case of a single receiver 
under either linear or piecewise-linear convex power-rate curves, we provide an efficient method 
to calculate the critical numbers that complete the characterization of the optimal policy when 
certain technical conditions are satisfied. 

^We use the terms target level and critical number interchangeably throughout the paper. 
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The second generalization of the single receiver model under linear power-rate curves is to a 
single user transmitting to two receivers over a shared wireless channel. In this case, we state 
and prove the structure of the optimal policy, and show how the peak power constraint in each 
slot couples the optimal scheduling of the two receivers' packet streams. 

In all three setups, we show the structure of the optimal policy in the finite horizon discounted 
expected cost problem extends to the infinite horizon discounted and average expected cost 
problems. 

Throughout the analysis, we make a novel connection with inventory models that may prove 
useful in other wireless transmission scheduling problems. Because the inventory models corre- 
sponding to our wireless communication models have not been previously examined, our results 
also represent a contribution to the inventory theory hterature. 

The remainder of this paper is organized as follows. In the next section, we describe the system 
model, formulate finite and infinite horizon MDPs, and relate our model to models in inventory 



theory. In Section III we consider the case of a single receiver under linear power-rate curves. 
While this case can be considered a special case of the models of Sections |IV] and |V} we present 
it first in order to (i) state additional structural properties of the optimal transmission policy to 
a single user under linear power-rate curves that are not true in general for the cases discussed 



in Sections |IV] and |V[ (ii) highlight some intuitive takeaways that carry over to the generalized 
models, but are more transparent in the simpler model; and (iii) compare it to related problems 
in the wireless communications literature. We analyze the structure of the optimal scheduling 
policy for the finite horizon problem, provide a method to compute the critical numbers that 
complete the characterization of the optimal policy when some additional technical conditions are 
met, and provide sufficient conditions for this problem to be equivalent to a previously-studied 



single deadline problem. Section IV generalizes the analysis of Section III to the case of a single 
receiver under piecewise-linear convex power-rate curves, and also addresses the infinite horizon 
problems for the case of a single receiver. In Section |V} we analyze the structure of the optimal 
policy when there are two receivers with linear power-rate curves. We discuss the relaxation of 
the strict underflow constraints and the extension to the general case of M receivers in Section 
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Vl} Section |VII| concludes the paper. 



II. Problem Description 

In this section, we present an abstraction of the transmission scheduling problem outlined 
in the previous section and formulate three optimization problems. While most of this paper 
focuses on the cases of one and two users, the formulation in this section is for the more general 



multi-user (multi-receiver) case, so that we can discuss this more general case in Section VI-B 



A. System Model and Assumptions 

We consider a single source transmitting media sequences to M users/receivers over a shared 
wireless channel. The sender maintains a separate buffer for each receiver, and is assumed to 
always have data to transmit to each receiver]^ We consider a fluid packet model that allows 
packet to be split, with the receiver reassembling fractional packets. Each receiver has a playout 
buffer at the receiving end, assumed to be infinite. While in reality this cannot be the case, it is 
nevertheless a reasonable assumption considering the decreasing cost and size of memory, and 
the fact that our system model allows holding costs to be assessed on packets in the receiver 
buffers. See Figure [T] for a diagram of the system. 
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Fig. 1. System model. 



This assumption is commonly referred to as the infinite backlog assumption. 
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We consider time evolution in discrete steps, indexed backwards by n = A^, — 1, . . . , 1, 
with n representing the number of slots remaining in the time horizon. N is the length of the 
time horizon, and slot n refers to the time interval [n,n — 1). 

At the beginning of each time slot, the scheduler allocates some amount of power (possibly 
zero) for transmission to each user. The total power consumed in any one slot must not exceed the 
fixed power constraint, P. Following transmission and reception in each slot, a certain number of 
packets are removed/purged from each receiver buffer for playing. The transmitter (or scheduler) 
knows precisely the packet requirements of each receiver (i.e., the number of packets removed 
from the buffer) in each time slot. This is justified by the assumption that the transmitter knows 
the encoding and decoding schemes used. We assume that packets transmitted in slot n arrive 
in time to be used for playing in slot n, and that the users' consumption of packets in each 
slot is constant, denoted by d = . . ,d^^). This latter assumption is less realistic, but 

may be justified if the receiving buffers are drained at a constant rate at the MAC layer, before 
packets are decoded by the media players at the application layer. It is also worth noting that 
the same techniques we use in this paper to analyze the constant drainage rate case can be 
used to examine the case of time-varying drainage rates. We discuss the extension to the case 



of time-varying drainage rates further in Section III-A We also assume the receiver buffers are 
empty at the beginning of the time horizon, and that even when the channels are in their worst 
possible condition, the maximum power constraint P is sufficient to transmit enough packets 
to satisfy one time slot's packet requirements for every user. We discuss the relaxation of this 



assumption in Section VI-A 



In general, wireless channel conditions are time-varying. Adopting a block fading model, 
we assume that the slot duration is within the channel coherence time such that the channel 
conditions within a single slot are constant. User m's channel condition in slot n is modeled 
as a random variable, S^. We assume that the evolution of a given user's channel condition 
is independent of all other users' channel conditions and the transmitter's scheduling decisions. 
We also assume that the transmitter learns all the channel states through a feedback channel at 
the beginning of each time slot, prior to making the scheduling decisions. 
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We begin by modeling the evolution of each user's channel condition as a finite-state ergodic 
homogeneous Markov process, {S'™}„^^jy_^ ^ with state space 5"*!^ Namely, conditioned on 
the channel state, S^, at time n, user m's channel states at future times (n — 1, n — 2, . . .) 
are independent of the channel states at past times {n + + 2,...)- Note the somewhat 
unconventional notation that future times are indexed by lower epoch numbers, as n represents 
the number of slots remaining in the time horizon. Modeling time backwards facilitates the 



analysis of the infinite horizon problems, as will be seen for example in Section IV-C It may 
also be the case that each user's channel condition is independent and identically distributed (IID) 
from slot to slot. When this is the case, we can often say more about the optimal transmission 



policy, as will be seen for example in Sections III-B and IV-B 



Associated with each channel condition for a given user is a power-rate function. If user m's 
channel is in condition s™, then the transmission of r units of data to user m incurs a power 
consumption of c'"(r, s™). This power-rate function c'^(-, s™) is commonly assumed to be linear 
(in the low SNR regime) or convex (in the high SNR regime). In this paper, we consider power- 
rate functions that are linear or piecewise-linear convex, the latter of which can be used to 
approximate more general convex power-rate functions. We assume that sending data consumes 
a strictly positive amount of power, and therefore take the power-rate functions to be strictly 
increasing under all channel conditions. 

The goal of this study is to characterize the control laws that minimize the transmission 
power and packet holding costs over a finite or infinite time horizon, subject to tight underflow 
constraints and a maximum power constraint in each time slot. 

B. Notation 

Before proceeding, we introduce some notation. We define := [0, oo) and IV := {1, 2, . . .}. 
A single dot, as in a • 6, represents scalar multiplication. We use bold font to denote column 
vectors, such as w = (w^, w^, . . . , w^'^). We include a transpose superscript whenever a vector is 

Theorems S y and y and their proofs remain valid as stated when each user's chaimel condition is given by a more 
general homogeneous Markov process that is not necessarily finite-state and ergodic. 
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meant to be a row vector, such as w^. The notations w ^ w and w ^ w denote component-wise 
inequalities; i.e., w'^ < (respectively, >) w'^, Vm. Finally, we use the standard definitions of 
the meet and join of two vectors. Namely, 

wAw = {w\w\...,w^) a{w\w'',...,w'') 

:— (mm [w^, w^} , min [w^, w^} , . . . , min [w^ , w^^^^ , 
and wVw ^ {w\w'^, . . . ,w^) V {w\w'^, . . . ,w^) 

:— ^max {ly^, ly^} , max {ly^, , . . . , max {ly"^, ly^}^ . 

C. Problem Formulation 

We consider three problems. Problem (PI) is the finite horizon discounted expected cost 
problem; Problem (P2) is the infinite horizon discounted expected cost problem; and Problem 
(P3) is the infinite horizon average expected cost problem. The three problems feature the same 
information state, action space, system dynamics, and cost structure, but different optimization 
criteria. 

The information state at time n is the pair (X„, S„), where the random vector 
X„ = (X^, X^,--- , X^^) denotes the current receiver buffer queue lengths, and 
S„ = (S*^, S^, • • • , S^) denotes the channel conditions in slot n (recall that n is the number 
of steps remaining until the end of the horizon). The dynamics for the receivers' queues are 
governed by the simple equation X„_i = X„ + Z„ — d at all times n — N,N — l,...,l, where 
Z„ is a controlled random vector chosen by the scheduler at each time n that represents the 
number of packets transmitted to each user in the n*'* slot. At each time n, Z„ must be chosen 
to meet the peak power constraint: 

M 
m=l 

and the underflow constraints: 

X- + Z->d- , Vme{l,2,...,M} . 
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Clearly, the scheduler cannot transmit a negative number of packets to any user, so it must also 
be true that > for all m. 

We now present the optimization criterion for each problem. In addition to the cost associated 
with power consumption from transmission, we introduce holding costs on packets stored in 
each user's playout buffer at the end of a time slot. The holding costs associated with user 
m in each slot are described by a convex, nonnegative, nondecreasing function, h"\-), of the 
packets remaining in user m's buffer following playout, with \imx-^oo h"^{x) = oo. We assume 
without loss of generality that h'^iO) = 0. Possible holding cost models include a linear model, 
K^{x) = ■ X for some positive constant /i™, or a barrier-type function such as: 



0, if a; < yU 

K ■ (x — /i), if X > {k very large) 

which could represent a finite receiver buffer of length /ij^ 

In Problem (PI), we wish to find a transmission policy tt that minimizes J'^ the finite 
horizon discounted expected cost under policy tt, defined as: 

{N M 
t=l m=l 

where < a < 1 is the discount factor and J-jv denotes aU information available at the beginning 
of the time horizon. For Problem (P2), the discount factor must satisfy < a < 1, and the infinite 
horizon discounted expected cost function for minimization is defined as: 

For Problem (P3), the average expected cost function for minimization is defined as: 

In all three cases, we allow the transmission policy tt to be chosen from the set of all history- 

^Taking /i to be greater than the time horizon TV in the finite horizon expected cost problem is equivalent to not assessing 
any holding costs in Problem (PI). 
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dependent randomized and deterministic control laws, 11 (see, e.g., [43, Definition 2.2.3, pg. 
15]). 

Combining the constraints and criteria, we present the optimization formulations for Problem 
(PI) (or (P2) or (P3)): 

M 

s.t. J] c" (Z™, 5:r) < ^, ^-P-l, Vn 

m=l 

> maxjO.c/'" -X™}, w.p.l, Vn, Vm e {1, 2, . . . , M}. 

Problem (PI) may be solved using standard dynamic programming (see, e.g., [43], [44]). The 
recursive dynamic programming equations are given byj^ 

M 

J2 {c"" {z"", s™) + h"" (x™ + - d™)} 
14(x, s) = min ( m=i 



+a- ]E[Vn-i(x + z-d,Sn-i) \Sn = s\ 

n = N,N (1) 



Vo{x,s) = 0, VxeiR*^Vse5:=5^ X ... x5^^ 

where V{-, ■) is the value function or expected cost-to-go, and the action space is defined as: 

C z y max {0, d — x} and ^ 

^'*(x, s):=\zeR^: M k Vx G iRf , Vs E S, (2) 

I J2 c"" (2", s™) < P J 

m=l 

where the maximum in ([2]) is taken element-by-element (i.e., > max {0, d"^ — z"^} Vm). Note 
that our assumption that the maximum power constraint P is always sufficient to transmit enough 
packets to satisfy one time slot's packet requirements for every user (i.e., J2m=i ('^™) ■5™) — 
P, Vs G S) ensures that the action space ^''(x, s) is always non-empty. 

As will be shown in the proofs of Theorems y and [10] our model satisfies the measurable selection condition 3.3.3 of [43, 
pg. 28], justifying the use of min rather than inf in the dynamic programming equations. 
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D. Relation to Inventory Theory 



The model outlined in Section II-A corresponds closely to models used in inventory theory. 
Borrowing that field's terminology, our abstraction is a multi-period, single-echelon, multi-item, 
discrete-time inventory model with random (linear or piecewise-linear convex) ordering costs, 
a budget constraint, and deterministic demands. The items correspond to the streams of data 
packets, the random ordering costs to the random channel conditions, the budget constraint to 
the power available in each time slot, and the deterministic demands to the packet requirements 
for playout. 

To the best of our knowledge, this particular problem has not been studied in the context of 
inventory theory, but similar problems have been examined, and some of the techniques from the 
inventory theory literature are useful in analyzing our model. References [45]-[52] all consider 
single-item inventory models with linear ordering costs and random prices. The key result for 
the case of deterministic demand of a single item with no resource constraint is that the optimal 
policy is a base-stock policy with different target stock levels for each price. Specifically, for 
each possible ordering price (translates into channel condition in our context), there exists a 
critical number such that the optimal policy is to fill the inventory (receiver buffer) up to that 
critical number if the current level is lower than the critical number, and not to order (transmit) 
anything if the current level is above the critical number. Of the prior work, Kingsman [47], [48] 
is the only author to consider a resource constraint, and he imposes a maximum on the number 
of items that may be ordered in each slot. The resource constraint we consider is of a different 
nature in that we limit the amount of power available in each slot. This is equivalent to a limit 
on the per slot budget (regardless of the stochastic price realization), rather than a limit on the 
number of items that can be ordered. 

Of the related work on single-item inventory models with deterministic linear ordering costs 
and stochastic demand, [53] and [54] are the most relevant; in those studies, however, the 
resource constraint also amounts to a limit on the number of items that can be ordered in 
each slot, and is constant over time. References [55]-[57] consider single-item inventory models 
with deterministic piecewise-linear convex ordering costs and stochastic demand. The key result 

Febi-uary 16, 2010 DRAFT 



16 



in this setup is that the optimal inventory level after ordering is a piecewise-linear nondecreasing 
function of the current inventory level (i.e., there are a finite number of target stock levels), and the 
optimal ordering quantity is a piecewise-linear nonincreasing function of the current inventory 
level. Porteus [58] refers to policies of this form as finite generalized base-stock policies, to 
distinguish them from the superclass of generalized base-stock policies, which are optimal when 
the deterministic ordering costs are convex (but not necessarily piecewise-hnear), as first studied 
in [59]. Under a generalized base- stock policy, the optimal inventory level after ordering is a 
nondecreasing function of the current inventory level, and the optimal ordering quantity is a 
nonincreasing function of the current inventory level. 

References [60]-[63] consider multi-item inventory systems under deterministic ordering costs, 
stochastic demand, and resource constraints. We discuss related results from these studies in more 
detail in Section |Vl 

We are not aware of any prior work on (i) single-item inventory models with random piecewise- 
linear convex ordering costs; (ii) exact computation of the critical numbers in any sort of 
finite generalized base-stock policy; or (iii) multi-item inventory models with random ordering 
costs and joint resource constraints. Therefore, not only is this connection between wireless 
transmission scheduling problems and inventory models novel, but the results we present in this 
paper also represent a contribution to the inventory theory hterature. 

III. Single Receiver with Linear Power-Rate Curves 

In this section, we analyze the finite horizon discounted expected cost problem when there is 
only a single receiver (M = 1), and the power-rate functions under different channel conditions 
are linear. One such family of power-rate functions is shown in Figure [2} where there are three 
possible channel conditions, and a different linear power-rate function associated with each 
channel condition. Note that due to the power constraint P in each slot, the effective power-rate 
function is a two-segment piecewise-linear convex function under all channel conditions. We 
subsequently simplify our notation and use Cs to denote the power consumption per unit of data 
transmitted when the channel condition is in state s. Because there is just a single receiver, we 
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also drop the dependence of the functions and random variables on m. We defer the infinite 



horizon expected cost problems for this case until Section IV-C 



Power 
Consumed 







^medium) ^ 


''^excellent) 


Slope 


> 



p 

Packets Transmitted 



Fig. 2. A family of linear power-rate functions. Due to the power constraint, the effective power-rate function, shown above 
for each of the three channel conditions, is a two-segment piecewise-linear convex function. When the channel condition is s, 
the slope of the first segment is c^. 



We denote the "best" and "worst" channel conditions by Sbcst and s^orst^ respectively, and 
denote the slopes of the power-rate functions under these respective conditions by Cmin and Cmax- 
That is, 



< Csb,,t = Cmin := min{cj < maxjcj =: Cmax = c^^,,,^ < 



P 



With these notations in place, the dynamic program ([T|) for Problem (PI) becomes: 

Vn{x,s) 



Cs ■ z + h{x + z — d) 

max{0,d-xO<.<£ ^ ■ lE[Vn-l{x + Z - d, S^-l) \ = s] 

Cs - (y-x) + h{y - d) 
+a ■E\Vri~i{y - d, Sn-i) \ = s] 

gn{y,s 



mm 



mm 



max(a::,d)<3/<x-|- 
-Cs ■ X + 



mm 



Vo{x,s) 



ma,x(x,d)<y<x+-^ 

0, Va; G J?+,Vs G S, 



n = N,N , 



(3) 



(4) 



where gn{y, s) := Cs-y+h{y—d)+a-lE[Vn-i{y—d, Sn-i) | 5'„ = s] . Here, the transition from O) 
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to (|4]) is done by a change of variable in the action space from Z„ to 1^, where F„ = The 
controlled random variable Yn represents the queue length of the receiver buffer after transmission 
takes place in the n^^ slot, but before playout takes place (i.e., before d packets are removed 
from the buffer). The restrictions on the action space, ma,x{x,d) < y < x + ^, ensure: (i) a 
nonnegative number of packets is transmitted; (ii) there are at least d packets in the receiver 
buffer following transmission, in order to satisfy the underflow constraint; and (iii) the power 
constraint is satisfied. 

A. Structure of Optimal Policy 

With the above change of variable in the the action space, the expected cost-to-go at time 
n, Vn{x, s), depends on the current buffer level, x, only through the fixed term —Cs ■ x and the 
action space; i.e., the function gn does not depend on x. This separation allows us to leverage the 
inventory theory techniques of showing "single critical number" or "base-stock" policies, which 
date as far back as [64]. The following theorem gives the structure of the optimal transmission 
policy for the finite horizon discounted expected cost problem. 

Theorem 1. For every n E {1,2,..., N} and s E S, define the critical number 



6„(s) := min <^ y G [d, oo) : gn{y,s)= min gniy.s) 
y j/e[d,oo) 

Then, for Problem (PI) in the case of a single receiver with linear power-rate curves, the optimal 
buffer level after transmission with n slots remaining is given by: 



X, if X > bn{s) 

bn{s), if bn{s) - f < X < , (5) 

X + f , if X < bn{s) - f 
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or, equivalently, the optimal number of packets to transmit in slot n is given by: 



0, if x> bn{s) 

z*^{x, s) := I bn{s) - X, if bn{s) - £ < X < 



(6) 




if X < bn{s) 



p 



Furthermore, for a fixed s, bn{s) is nondecreasing in n: 



N ■(!> bN{s) > &jv-i(s) > . . . > &i(s) = d . 



(7) 



If, in addition, the channel condition is independent and identically distributed from slot to slot, 
then for a fixed n, 6„(s) is nonincreasing in c^; i.e., for arbitrary s^,s^ E S with c^i < 0^2, we 
have: 



for each possible channel condition realization s, the critical number 6„(s) describes the target 
number of packets to have in the user's buffer after transmission in the n*'^ slot. If that number 
of packets is already in the buffer, then it is optimal to not transmit any packets; if there are 
fewer than the target and the available power is enough to transmit the difference, then it is 
optimal to do so; and if there are fewer than the target and the available power is not enough to 
transmit the difference, then the sender should use the maximum power to transmit. See Figure 
[3] for diagrams of the optimal policy. 

Details of the proof of Theorem [T] are included in Appendix A. The key realization is that for 
all n and all s, gn{-,s) : [rf, 00) — > is a convex function in y, with \im.y^oo Qniv ^ s) = 
00. Thus, for all n and all s, gn{-,s) has a global minimum 6„(s), the target number of 
packets to have in the buffer following transmission in the n*^ slot. The key idea to show 
(|7]) is to fix s G S, view gn{y,s) as a function of y and n, say f{y,n), and show that the 
function /(•,•) is submodular. From the proof, one can also see that if we relax the stationary 
(time-invariant) deterministic demand assumption to a nonstationary (time- varying) deterministic 
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n-d> 6„(Sbest) > bn{s^) > bn{s^) > &n(Sworst) = d . 



(8) 



The optimal transmission policy in Theorem [T] is a modified base-stock policy. At time n. 



20 



y (x,s}-x 



Optimal 
Number of 
Packets to f_ 
Transmit 




BufferLevel Before Transmission 
(a) 



Optimal b„{s)- — 
BufferLevel 

After f_ 
Transmission 



BufferLevel Before Transmission 
(b) 



Fig. 3. Optimal policy in slot n when the state is {x,s). (a) depicts the optimal transmission quantity, and (b) depicts the 
resulting number of packets available for playout in slot n. 



demand sequence, \dN, d^-i, • • • , dA (with dn < — ^ for all n), then the structure of the optimal 
policy is still as stated in ([5]). If the channel is IID, then the following statement, analogous to 
([S]), is true for arbitrary s^, G iS with c^i < 0^2: 

n 

n ( "5 worst 

) = d^, Vne {l,2,...,iV} . (9) 

i=l 

However, (|7]), the monotonicity of critical numbers over time for a fixed channel condition, is not 
true in general under nonstationary deterministic demand. As one counterexample, ([9]) says that 
under an IID channel, the critical numbers for the worst possible channel condition are equal to 
the single period demands. Therefore, if the demand sequence is not monotonic, the sequence 
of critical numbers, {6„ (sworst)}„=i 2 '^^'^ monotonic. 

B. Computation of the Critical Numbers 

In this section, we consider the special case where the channel condition is independent and 
identically distributed from slot to slot, the holding cost function is linear (i.e., h{x) = h ■ x 
for some h > 0), and the following technical condition is satisfied: for each possible channel 
condition s, — = I ■ d for some / G IV; i.e., the maximum number of packets that can be 

Cs 

transmitted in any slot covers exactly the playout requirements of some integer number of slots. 
Under these three assumptions, we can completely characterize the optimal transmission policy. 

Theorem 2. Define the threshold jrij far n E {1,2, . . . , N} and j E IN recursively, as fallows: 
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(i) If 3 = 1. 7n,j = oo; 

(ii) If j > n, 7„j = 0; 

(iii) If2<j<n, 



7: 



-h + a 



( E P{s) ■ 7n-l,i-l + E P{s) ■ Cs ^ 

s: Cs>7„-ij-i 



v 



s: Cs<7„_ij_i 

+ E PI'S) ■ [7n-l,i-l+L(s) - Cs] 

s: Cs<-y„-ij-i+L{3) 



(10) 



where p{s) is the probability of the channel being in state s in a time slot, and L{s) 



p 

d-Ca 



For each n G {1, 2, . . . , A^} and s E S, if •ynj+i < < 7„j, define 6„(s) := j ■ d. The optimal 
control strategy for Problem (PI) is then given by tt* = {?/^,|/^_^, . . . ,yl}, where 



X, if X > bn{s) 

bn{s), if h^{s)-^<X<h^{s) . (11) 

X + f , if X < hn{s) - f 



Note that with n slots remaining, = 7„,n+i < ln,n < 7n,n-i < • • • < 7n,2 < 7n,i = oo, so 
bn{s) is well-defined. 

Compared to using standard numerical techniques to approximately solve the dynamic program 
and find a near-optimal policy, the above result not only sheds more insight on the structural 
properties of the problem and its ejcacffy-optimal solution, but also offers a computationally 
simpler method. In particular, the optimal policy is completely characterized by the thresholds 
{7"j}ne{i2 N} jeiN- Calculating these thresholds recursively, as described in Theorem [2| re- 
quires 0{N'^ \S\) operations, which is considerably simpler from a computational standpoint 
than approximately solving the dynamic program [41], [42]. 

To prove Theorem [2[ we show by backwards induction that it is worse to transmit either fewer 
or more packets than the number suggested by the policy tt*. The detailed proof is omitted, as 
Theorem [2] is a special case of Theorem |4[ however, we discuss some intuition behind the proof 
and the thresholds here. 
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The reason for the technical condition regarding the maximum number of packets that can 
be transmitted in any slot is as follows. The optimal action at all times (in general, without the 
technical condition) is either to transmit enough packets to fill the buffer up to a level satisfying 
the playout requirements of some number of future slots, or to transmit at maximum power. 
When the technical condition is satisfied, transmitting at maximum power also results in filling 
the buffer up to a level satisfying the playout requirements of some number of future slots. 
Thus, under the optimal policy, all realizations result in the buffer level at the end of every time 
slot being some integer multiple of the demand, d. This fact makes it easier to compute the 

thresholds {lnAne{l,2,...,N}, jeW 



An intuitive explanation of the recursion ([TO]) is as follows. The threshold 7„ may be 
interpreted as the per packet power cost at which, with n slots remaining in the horizon, the 
expected cost-to-go of transmitting packets to cover the user's playout requirements for the next 
j — 1 slots is the same as the expected cost-to-go of transmitting packets to cover the user's 
requirements for the next j slots. That is, 7^ j should satisfy: 



a ■ IE 



Vn-l[{j -l)-d,Sn-l) +^nJ-d + h-d = a-lE Vn-l{{j -2)-d,Sn 



which is equivalent to: 



-h+--E 
d 



K-l ((j -2)-d, Sn-l) - K-1 ((j -l)-d, Sn-l) 



sG5 



Vn^i ((j - 2) ■ - K-1 ((j -l)-d, 



(12) 
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-h + 



a 
d 



s: bn-iis)<{j-2)-d 



-h- d + a- E 



Vn-2 ({j - 3) • rf, 5'„_2 
_ -K-2((j-2)-rf,5„.2) _ 



+ 



+ 



^ p{s) -Cs-d 

(i-2)-d<6„_i (s)< (j,--2+L(s)) -d 



fcn-i(s)>(i-2+L(s))-d 



-h- d + a ■ E 



Vn-2[{j-S + Lis))-d,Sr.^2 
-Vn^2[{j~2 + L{s))-d,Sn-2 



(13) 



— /i + a ■ < 



E ■ 7n-l,i-l 

s: 6„_i(s)<(j-2)-d 

s: (i-2)-d<6„_i(s)< (i-2+L(s)) -d 

+ E ■ 7n-lj-l+L(.) 

s: b„_i(s)>(j-2+L(s))-d 



E ■ 7n-lj-l 

+ E ■ 

■5- 7n-lj-l + i{s)<Cs<7n-l,j-l 

+ E ■ ln-l,j-l+L(s) 

s: Cs<7„_ij-_i+L(s) 



(14) 



(15) 



Here, ( fT3] ) follows from the structure of the optimal control action (|5]). If the channel condition 
s in the (n — 1)'^* slot is such that 6n-i(s) < (j — 2) ■ (i, then no packets are transmitted when 
the starting buffer level is either {j — 2) ■ d or {j — 1) ■ d, and the respective buffer levels at the 
beginning of slot n — 2 are (j — 3)-d and (j — 2)-d. The instantaneous costs resulting from the 
two starting buffer levels differ by —h ■ d. When {j — 2) ■ d < &„-i(s) < (j — 2 + L{s)) ■ d, the 
power constraint is not tight starting from (j — 1) ■ d, so the buffer level after transmission is 
the same starting from (j — 2) ■ d or (j — 1) ■ d. The instantaneous costs resulting from the two 
starting buffer levels differ by ■ d, as an extra d packets are transmitted if the starting buffer is 
(j — 2)-d. Finally, when 6„_i(s) > (j — 2 + L{s)) ■ d, the power constraint is tight starting from 
both (j — 2) ■ d and (j — \) ■ d. Therefore, the instantaneous cost difference is —h ■ d, and the 
respective buffer levels at the beginning of slot n — 2 are (j — 3 + L{s)) ■ d and (j — 2 + L{s)) ■ d. 
Equation ( [141 ) follows from ([12]), with n — 1, j — 1 substituted for j, and ( [T?] ) follows from 
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the definition tliat bn{s) = j ■ d if 7nj+i < < 7. 



Comparing the threshold 7„ , defined in ( 10) to the corresponding threshold in the unrestricted 



(no power constraint) single user problem [47], [52], the only difference is the third term of the 



right-hand side of (10): 



{s: Cs<7„_ij_i+i(s)} 

which is absent in the unrestricted case. For all n G {1,2, .. . , A^} and j G IV, this term is 
nonnegative. Thus, for a fixed n and j, the threshold in the restricted case is at least as high as 
the corresponding threshold in the unrestricted case. It follows that the optimal stock-up level 
is also at least as high in the restricted case for all n G {1,2, .. . , A^} and s G 5. The 
intuition behind this difference is that the sender should transmit more packets under the same 
(medium) conditions, because it is not able to take advantage of the best channel conditions to 
the same extent due to the power constraint. 

C. Sufficient Conditions for Equivalence with the Single Deadline Problem 

In [27, Section IITD], Fu, Modiano, and Tsitsiklis consider the related single user problem 
of transmitting a given amount of data with minimum energy by a fixed deadline. They also 
represent the fading channel by a linear power-rate function with a different slope in each 
channel condition, and consider a power constraint P in each slot. There is just a single explicit 
underflow constraint (the deadline) in their problem; however, because the terminal cost is set 
to 00 if all the data is not transmitted by the deadline, the scheduler must transmit enough 
data in each slot so that it can still complete the job if the channel is in the worst possible 
condition in all subsequent slots. Thus, if (itotai is the total amount of data that must be sent by 
the deadline and (i„orst is the amount that can be sent in a slot under the worst channel condition, 
the transmitter must have sent at least c^totai — c^worst packets by the beginning of the last slot, 
at least (itotai — 2 ■ (i„orst packets by the beginning of the second to last slot, and so forthj^ So 

^An unstated assumption in the formulation in [27, Section III-D] is that dworst times the horizon length must be at least as 
large as dtotai- 
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there are in fact implicit constraints on how much data must be transmitted by the end of slots 



+ l,N- 



+ 2, . . ., N — 2, N — 1. With this interpretation, we believe that 
our Theorem [2] is equivalent to Theorem 3 and its corollary in [27] in the special case that, in 
addition to the hypotheses of our Theorem [2| a = 1, h = 0, and L (sworst) = 1- For, when these 
conditions are met, the implicit constraints in [27] coincide exactly with the explicit underflow 
constraints in our problem. Of course, when these three conditions are not satisfied, the two 
problems are quite different. For a more detailed comparison of these two problems, see [34]. 

D. Intuitive Takeaways on the Role of the Strict Underflow Constraints 

As mentioned earlier, the main idea of energy-efficient communication over a fading channel 
via opportunistic scheduling is to minimize power consumption by transmitting more data when 
the channel is in a "good" state, and less data when the channel is in a "bad" state. However, 
in order to comply with the underflow or deadline constraints, the transmitter may be forced to 
send data under poor channel conditions. 

One intuitive takeaway from the analysis is that it is better to anticipate the need to comply 
with these constraints in future slots by sending more packets (than one would without the 
deadlines) under "medium" channel conditions in earlier slots. Doing so is a way to manage the 
risk of being stuck sending a large amount of data over a poor channel to meet an imminent 
deadline constraint. Another intuitive takeaway is that the closer the deadlines and the more 
deadlines it faces, the less "opportunistic" the scheduler can afford to be. In summary, both the 
underflow constraints and the power constraints shift the definition of what constitutes a "good" 
channel, and how much data to send accordingly. For more detailed comparisons of single- 
receiver opportunistic scheduling problems highlighting the role of the deadline constraints, see 
[34]. 

IV. Single Receiver with Piecewise-Linear Convex Power-Rate Curves 

In this section, we analyze Problems (PI), (P2), and (P3) when there is only a single receiver 
(M = 1), and the power-rate functions under different channel conditions are piecewise-linear 



convex. Note that this is a generalization of the case considered in Section III 
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We assume without loss of generality that under each channel condition s, the power-rate 
function has K + 1 segments, and thus the power consumed in transmitting z packets under 
channel condition s can be represented as follows: 

K-l 



s) + ^|(cfc+i(s) - Cfc(s)) ■ max{z - Zfc(s),0}| , where 



C[Z, S) = Z ■ Co 

< co(s) < ci(s) < • • ■ < ck{s) , and 

= 5„i(s) < zq{s) < zi{s) < ■ ■ ■ < zk-i{s) < zk{s) = oo . 

The terms {ckis)} ,^^^q ^ ^| represent the slopes of the segments of c(-,s), and the terms 
{5fc(s)}j;,g|Q ^ represent the points at which the slopes of c(-,s) change. An example of 

a family of such power-rate functions is shown in Figure |4j For each channel condition s E S, 
we define the maximum number of packets that can be transmitted without exceeding the per 
slot power constraint P as: 

5max(s) := {z : c{z, s) = P} . 

Note that z„iax{s) is well-defined due to the strictly increasing nature of c(-,s). Recall that 
we assume 5max(s) > d, \fs E S. We also assume without loss of generality that 5max(s) > 
zk-i{s), Vs E S. 

In this case, the dynamic program ([T]) for Problem (PI) becomes: 

{c{z, s) + h{x + z — d) 
+a ■ iE[V^_i(x + z — d, Sn^i) I Sn = s\ 



mm 



|max{0,d-a::)<2;<2max(s)} 

Vq{x,s) = 0, Va; G J?+,Vs E S 



^^c{z,s)+gn{x + z,s)Y n = N,N (16) 



where g^iy, s) := h{y - d) + a ■ E - d, S'„_i)|S'„ = s] 
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Packets Transmitted 



Fig. 4. A family of piecewise-linear convex power-rate functions. Like Figure|2] we incorporate tlie power constraint into eacli 
curve to siiow the effective power-rate curve. As an example, the power-rate function c{- , spoor) is completely characterized 
by the sequence of slopes {ck{spooR)}k£{o 123} sequence of points where the slopes change {5fc(spoofl)}fcg{o 1 2}- 

The maximum number of packets that can be transmitted in a slot when the channel condition is spooR is 5max(spooB)- 



A. Structure of Optimal Policy for the Finite Horizon Discounted Expected Cost Problem 

We showed in Theorem [1] that the the optimal transmission policy to a single receiver in 
the case of linear power-rate curves is a modified base-stock policy characterized by a single 
critical level for each channel condition. In this section, we generalize this result to the case 
of piecewise-linear power-rate curves, and show that the optimal receiver buffer level after 
transmission (respectively, optimal number of packets to transmit) is no longer a three-segment 
piecewise-linear nondecreasing (respectively, nonincreasing) function of the starting buffer level 
as in Figure [3| but a more general piecewise-hnear nondecreasing (respectively, nonincreasing) 
function. 

Theorem 3. In Problem (PI) with a single receiver under piecewise-linear convex power-rate 
curves, for every n E {1, 2, ... , N} and s E S, there exists a nonincreasing sequence of critical 
numbers {^n,fc(s)}^ such that the optimal number of packets to transmit with n slots 
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, (17) 



remaining is given by: 

if bn,k{s) - Zk-lis) <x < bn,k-i{s) - Zk-lis) , 

ke{o,i,...,K} 

hn,k{s) - X, if hn,k{s) - Zk{s) <X < - Zk-l{s) , 

A; G {0, 1, . . . , - 1} 

bn,K{s) - X, if hn,K{s) - ^max(s) <X < hn,K{s) - Zk-i{s) 
5max(s), if < X < Ve-(s) - 5max(s) 

where := oo, Vs G S. The optimal receiver buffer level after transmission is given by 

y*n{x,s) =x + zl{x,s). 

The optimal transmission policy in Theorem |3] is a finite generalized base-stock pohcy. It 
can be interpreted as follows. Under each channel condition s, there is a target level or critical 
number associated with each segment of the associated piecewise-linear convex power-rate curve 
shown in Figure |4j If the starting buffer level is below the critical number associated with the 
first segment, &n,o(s), the scheduler should try to bring the buffer level as close as possible to the 
target, hnfl{s). If the maximum number of packets sent at this per packet power cost, zq{s), does 
not suffice to reach the critical number hnfl{s), then those zq{s) packets are scheduled, and the 
next segment of the power-rate curve is considered. This second segment has a slope of ci(s) and 
an associated critical number 6„ which is no higher than hnfl{s), the first critical number. 
If the starting buffer level plus the Zq{s) already-scheduled packets brings the buffer level above 
bn,i{s), then no more packets are scheduled for transmission. Otherwise, it is optimal to transmit 
so as to bring the buffer level as close as possible to by transmitting up to Zi{s) — zq{s) 

additional packets at a cost of ci(s) power units per packet. This process continues with the 
sequential consideration of each segment of the power-rate curve. At each successive iteration, the 
target level is lower and the starting buffer level, updated to include already- scheduled packets, 
is higher. The process continues until the buffer level reaches or exceeds a critical number, or the 
full power P is consumed. Note that this sequential consideration is not actually done online, but 
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only meant to provide an intuitive explanation of the optimal policy. See Figure |5] for diagrams 
of the structure of the optimal finite generalized base-stock policy. 



Optimal 
Number of 
Packets to 
Transmit 




Buffer Level Before Transmission 
(a) 



(S) ... 



Optimal b„i,{s) 
Buffer Level 

After 
Transmission 




> X 



Buffer Level Before Transmission 
(b) 



Fig. 5. Optimal transmission policy in slot n when the state is {x,s). (a) depicts the optimal transmission quantity, and (b) 
depicts the resulting number of packets available for playout in slot n. 



B. Computation of Critical Numbers 

While finite generalized base-stock policies have been considered in the inventory literature for 
almost three decades, we are not aware of any previous studies that explicitly compute the critical 
numbers for any model where such a policy is optimal. In this section, we compute the critical 
numbers under each channel condition when technical conditions similar to those of Section 



III-B are satisfied. We consider the special case when the channel condition is independent and 
identically distributed from slot to slot; the holding cost function is linear (i.e., h{x) = h-x); and 
the following technical condition on the power-rate functions is satisfied for each possible channel 
condition s E S: 5max(s) = ^max ■ d for some Zmax £ and for every k e {0,1, . . . , K — 1}, 
Zk{s) = h-d for some Ik G JN; i.e., the slopes of the effective power-rate functions only change at 
integer multiples of the drainage rate d. Under these conditions, we can completely characterize 
the optimal transmission policy. 

As in Theorem [2| we recursively define a set of thresholds, and use them to determine the 
critical numbers, {6n,fc(s)}^g|_]^ q for each channel condition, at each time. 
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Theorem 4. Define the thresholds 7„ j for n G {1, 2, . . . , A^} and j E IN recursively, as follows: 

(i) If 3 = 1. 7n,j = oo; 

(ii) If j > n, %j = 0; 

(iii) If2<j<n, 



7n,j = -h + a 



( 



E V{S) ■ 7n-lj-l 

r. co(s)>7„_ij_i 



K-\ 

+ E <! 

A;=0 



E p(s)-Cfc(s) 

^: 7„_i,,_i+z:^(,)<Cfc(s)<7„_ij_i+i^_^(,) 



+ E ■ 7„-lJ-l+Z,(s) 

T'„_ij_i+£^^,^(s)<cx(s)<7„_ij_i+£^,_^(^) 
+ E V{S) ■ 7n-lj-l+Lmax(s) 



,(18) 



where p{s) is the probability of the channel being in state s in a time slot, Lk{s) :- 



for all s E S and k G {0,1, . . . , K — 1}, and I/max(s) : = 



d 



for all s G 5. For each 



n G {1, 2, ... , N} and s E S, define bn-i{s) := oo and for all k E {0, 1, ... , K}, if < 
Cfc(s) < 7„ j, define bn,k{s) := j ■ d. The optimal control strategy for Problem (PI) is then given 
by TT* = {2;^, z*j^_-^, ... ,zl], where for all n G {A^, - 1, . . . , 1}, s) is given by ( fTTj ). 



It is straightforward to check that Theorem |4] is in fact a generalization of Theorem |2j To see 
this, set = so that the summation from A; = OtoA; = -ft' — Ion the right-hand side of ( [T8| ) 
drops out. Then 7„ j in ( fTS] ) is the same as 7„ j in ([10]), co(s) corresponds to in ([10]), &n,o('5) 
corresponds to &n(s), 5max(s) corresponds to Linax(s) corresponds to L{s), and = 0. 

The resulting optimal transmission policies are also the same. 

In Theorem [4] the threshold 7„j may again be interpreted as the per packet power cost at 
which, with n slots remaining in the horizon, the expected cost-to-go of transmitting packets to 
cover the user's playout requirements for the next j — I slots is the same as the expected cost- 
to-go of transmitting packets to cover the user's requirements for the next j slots. The intuition 
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behind the recursion ([TS]) is similar to the detailed explanation given in Section |III-B[ Namely, 



we can start with equation ( |T2| ) and expand out the right-hand side based on the known structure 



of the optimal policy, until, after a fair bit of algebra, the result is ([TS]). A detailed proof of 
Theorem |4] is included in Appendix A. 

C. Structure of the Optimal Policy for the Infinite Horizon Discounted Expected Cost Problems 

In this section, we show that the optimal policy for the infinite horizon discounted expected 
cost problem is the natural extension of the optimal policy for the finite horizon discounted 
expected cost problem; namely, it is a finite generalized base-stock policy characterized by 
time-invariant sequences of critical numbers for each channel condition. These time-invariant 
sequences of critical numbers for the infinite horizon discounted expected cost problem are equal 
to the limit of the finite horizon sequences of critical numbers as the time horizon N goes to 
infinity. 

Theorem 5. 

(a) For a fixed x G -K+ and s E S, s) is nondecreasing in n. Moreoever, lim V^(x, s) 

n— ^-oo 

exists and is finite, Wx G -K+,Vs G S. 

(b) Define Voo{x,s) := lim Vn{x,s). Then Voo{x,s) is convex in x for any fixed s E S. 

n—^oo 

(c) Define Qooiy, s) := h{y — d) + a ■ IE \Voo iy — d, S') \ S = s], where S' is the channel 
condition in the subsequent slot. Then gn{y,s) converges monotonically to goo{y, s),^y G 
[d, oo), Vs G S; (jooiy, s) is convex in y for any fixed s G S; and lim cjooiy, s) = oo, Vs G S. 

3/— >oo 

(d) Define boo-i{s) := oo and 

&oo,fc(s) := max|(i,inf{6 | g'^{b,s) > -Cfc(s)}| , Vfc G {0, 1, . . . , A'} , 

where g'^{b, s) represents the right derivative: 

9^ib, s) := hm . 

yib y — b 

Then 6oo,fc(s) = lim bn,k{s) for all k G { — 1, 0, 1, ... , K}. 
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(e) Voo{x, s) satisfies the a-discounted cost optimality equation (a-DCOE): 



c{z, s) + h{x + z — d) 
{max{0,c(-x)<2<imax(s)} I +a ■ E[Vocix + z - d, S') \ S = s] 

|c(z,s) + gooix + z,s)y Wx e R+,\/s e S, (19) 



min 



mm 



|max{0,d-a;)<2<imax(s)| 



and the minimum on the right hand side of {19) is achieved by. 



Zk-i{s), if &oo,fc(s) - Zk-i{s) <x < &oo,fc-i(s) - Zk-i{s) , 

fcG{0,l,...,K} 
&oo,fc(s) - X, if &oo,fc(s) - Zkis) <x < boo,kis) - Zk-li-s) , 

k e {0,1,...,K -1} 

hoo,K{s) - X, if boo,K{s) - 5max(s) <X < boo,K{s) - ZK-i{s) 
5max(s), if 0<X< 6oo,x(s) - Z^aA^) 

(f) The optimal stationary policy for Problem (P2) in the case of a single receiver with 



piecewise-linear convex power-rate curves is given by tt* 



•)• 



A detailed proof, which follows the logic conveyed in the statement of the theorem, is included 
in Appendix B. As a special case of Theorem |5| the optimal policy in Problem (P2) for the 



case discussed in Section III of a single receiver with linear power-rate curves is given by 



-^L = iz*^,z*oc,---)^ where: 



and 6oo(s) := lim bn{s). 



0, if X > 6oo(s) 

6oo(s) - X, if 6oo(s) - f < X < boo{s) , 



p 



if X < b^(s) - 



D. Structure of the Optimal Policy for the Infinite Horizon Average Expected Cost Problems 

In this section we use the vanishing discount approach to show that the finite generalized 
base-stock structure is also optimal for the infinite horizon average expected cost problem, (P3). 
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We show that an optimal policy for the infinite horizon average expected cost problem exists 
and can be represented as the limit as the discount factor increases to one of optimal policies 
identified in Section |IV-C for the infinite horizon discounted expected cost problem. 



In Section IV-C , we suppressed the dependence of the value functions and optimal policies on 
the discount factor, a. Here, we make this dependence explicit by including the discount factor 
in the subscript labeling of the value functions and optimal policies for the infinite horizon 
discounted expected cost problem. For example, the value function defined in (b) of Theorem [5] 
is now denoted by Voo,a(x, s). 

Theorem 6. For all a G [0, 1), define: 



p 



inf \4o,a(a;, s), 
lim(l — a) ■ moo,a, (^nd 



WooA^^ ^) '■= Voo,a{x, S) - moo,a, Vx G M+, Ws G S. 



Then: 

(a) There exists a continuous function Woo,i(-, ■) and a selector -2^ ■) that satisfy the ACOE: 



p* + Woo,i(a;,s) 



c(z, s) + h{x + z — d) 

mm < 

|max{0,d-i')<z<5max(s)} I ~\-]E\woo,i{x + Z — d, S') | S* = s] 

s),s^ +h(^x + 2:^,1 (x, s) - d^ 



+ 1E 



Woo,i{x + z*^ i(a;, s) - d, S' 



S = s 



Vx G Vs G S. 



(b) The stationary policy tt^ ^ = (-2^ i, -z^ i, • • •) is optimal for Problem (P3) in the case of a 
single receiver with piecewise-linear convex power-rate curves. 

(c) The resulting optimal average cost beginning from any initial state (x, s) G iR+ x S is p*. 

(d) For every increasing sequence of discount factors {q;(/)}«=i,2,... approaching 1, there exists 
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a subsequence {«(/j)}j=i 2,... approaching 1 such that: 

Woo,i{x, s) = lim WooMh){x, s), Vx G R+, Ws G S. 

Therefore, for every s G S, Woo,i{x, s) is convex in x. 

(e) For every (s, s) G -K+ x S and increasing sequence of discount factors {«(0}«=i,2,... 
approaching 1, there exists a subsequence {a;(/i)}i=i,2,... approaching 1 and a sequence 
{x(z)}i=i,2,... approaching x such that: 

z*oo,i{x,s) = \im z* Jx{i),s) . 

(f) A stationary finite generalized base-stock policy is average cost optimal in the case of 
piecewise-linear convex power-rate curves, and a stationary modified base-stock policy is 
average cost optimal in the case of linear power-rate curves. 

Thus, the structure of the optimal policy is the same for all three problems, (PI), (P2), and 
(P3). The proof of Theorem |6] is discussed in Appendix C. 

E. General Convex Power-Rate Curves 



As mentioned in Section II-A, in general, the power-rate curve under each possible channel 
condition is convex. It can be shown that under convex power-rate curves at each time, the optimal 
number of packets to send is a nonincreasing function of the starting buffer level. However, 
without any further structure on the power-rate curves, it is not computationally tractable to 
compute such optimal policies, known as generalized base-stock policies (a superclass of the 
finite generalized base-stock policies discussed above). This is why we have chosen to analyze 
piecewise-linear convex power-rate curves, which can be used to approximate general convex 
power-rate curves. More specifically, our analysis suggests approximating the general convex 
power-rate curves by piecewise-linear convex power-rate curves where the slopes change at 
integer multiples of the demand d, in order to be able to apply Theorem |4] to compute the 
critical numbers in an extremely efficient manner. Doing so represents an approximation at the 
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modeling stage followed by an exact solution, as compared to modeling the power-rate curves 
as more general convex functions and having to approximate the solution. Finally, we note that 
increasing the number of segments used to model the piecewise-linear convex functions leads 
to a better approximation, but comes at the cost of some extra complexity in implementing the 
optimal policy, as the scheduler needs to store at least one critical number for each segment of 
each power-rate curve. 

V. Two Receivers with Linear Power-Rate Curves 

In this section, we analyze the finite and infinite horizon discounted expected cost problems 
when there are two receivers (M = 2), and the power-rate functions under different channel 
conditions are linear for each user. Each user m's channel condition evolves as a homogeneous 
Markov process, ^. As discussed in Sections |l] and the time-varying channel 

conditions of the two users are independent of each other, and the transmission scheduler can 



exploit this spatial diversity. Like Section III we denote the power consumption per unit of data 
transmitted to receiver m under channel condition s"* by c™. The row vector of these per unit 
power consumptions is given by c^, so that the total power consumption in slot n is given by 
Em=i c'"(^;r, '5™) = clZn. We denote the total holding costs ^2^=1 h'^iX^ + - cT) by 
/i(X„ + Z„-d). 

With these notations, the dynamic program ([T]) for Problem (PI) becomes: 

{c^z + /ifx + z - d) 1 
) (20) 
+a-iE[K-i(x + z-d,S„_i) I S„ = s] J 

. f c-[y-x] + %-d) 1 
= mm < > (21) 

ye^''(x,s) y+a-IE [K-i(y - d, S„_i) | S„ = s] J 

= -c>+ min |G„(y,s)| n = N,N - I, . . . ,1 , 
Vo{x, s) = 0, Vx G iR+, Vs e 5 := 5^ X S\ 
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where 



G„(y,s) :=c^y + /i(y-d) + a-iE[K-i(y-d,S„_i) | S„ = s], 

Vy G [d^,oo) X [(i^,oo),Vs G 5, and 
^d(x,s) := jyGiR^:y^dVxandc^[y-x] <pI, VxGiR^,VsG5. (22) 



The transition from ( |20| ) to pT[ ) follows again from a change of variable in the action space from 
Z„ to Y„, where Y„ = X„ + Z„. The controlled random vector Y„ represents the queue lengths 
of the receiver buffers after transmission takes place in the n^'^ slot, but before playout takes 
place (i.e., before packets are removed from user m's buffer). The restrictions on the action 
space, y ^ d V X and [y — x] < P, ensure: (i) a nonnegative number of packets is transmitted 
to each user; (ii) there are at least dJ^ packets in user m's receiver buffer following transmission, 
in order to satisfy the underflow constraint; and (iii) the power constraint is satisfied. 

Without the per slot peak power constraint, this M-dimensional problem would be separable. 



and could be solved by solving M instances of the one-dimensional problem of Section III 
however, the joint power constraint couples the queues]^ As a result, the optimal transmission 
quantity to one receiver depends on the other receivers' queue length, as the following example 
shows. 

Example 1. Assume receiver 1 's channel is currently in a "poor" condition, receiver 2' s channel 
is currently in a "medium " condition, and receiver 2 's buffer contains enough packets to satisfy 
the demand for the next few slots. We consider two different scenarios for receiver 1 's buffer 
level to show how the optimal transmission quantity to receiver 2 depends on receiver 1 's buffer 
level. In Scenario 1, receiver 1 's buffer already contains many packets. In this scenario, it may 
be beneficial for the scheduler to wait for receiver 2 to have a better channel condition, because 
it will be able to take full advantage of an "excellent" condition when it comes. In Scenario 
2, receiver 1 's queue only contains enough packets for playout in the current slot. It may be 

'This problem therefore falls into the class of weakly coupled stochastic dynamic programs [65], [66]. 
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optimal to transmit some packets to receiver 2 in the current slot in this scenario. To see this, 
note that even if receiver 2 experiences the best possible channel condition in the next slot, 
the scheduler will need to allocate some power to receiver 1 in order to prevent receiver I's 
bujfer from emptying. Therefore, the scheduler anticipates not being able to take full advantage 
of receiver 2's "excellent" condition in the next slot, and may compensate by sending some 
packets in the current slot under the "medium" condition. 

A. Structure of Optimal Policy for the Finite Horizon Discounted Expected Cost Problem 

Before proceeding to the structure of the optimal transmission poUcy, we state some key 
properties of the value functions in the following theorem. 

Theorem 7. With two receivers and linear power-rate curves, the following statements are true 
for n — 1,2, . . . , N, and for all s e S: 

(i) Vn-i{x,s) is convex in x. 

(ii) Vn-i{x,s) is supermodular in x; i.e., for all x,x & M^, 

Vn-l{x,s) + Vn-l{x,s) < Vn-l{x A X , s) + Vn-l{x V X , s) . 

(iii) Gn{y,s) is convex in y. 

(iv) Gn(y,s) is supermodular iny; i.e., for ally,y e [d^,oo) x [d^,oo), 

Gn{y,s) + Gn{y,s) < Gniy Ay,s) + Gniy yy,s) . 

(v) vl < Vn implies: 

inf \ argmin I G„ (y^, yl, s\ s^) \ \ > inf <^ argmin <^ Gn [yl, yl, s\ s^) 

[2/2e[ci2,Oo) I J J [j/2e[d2,Oo) I 

and y'^ < y^ implies: 

inf \ argmin <^ Gn {y^, vl, s\s^)\ \ > inf <^ argmin I Gn {yl, yl, s\ s^) 
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A detailed proof is included in Appendix A. Because — c^x is supermodular in x, the key 
part of the induction step in the proof of (ii) is to show that minyg_4d(x s) {G'n-i(y, s)} is also 
supermodular in x. Denoting argminyg_^d(xs) {^n-i(y, s)} by y*(x, s), we do this constructively 
by showing that for all x,x e iR^: 

min {G'„_i(y,s)} + min {G'„_i(y,s)} 

< G„_i(y,s) + G„_i(y,s) 

< G„_i(y*(xAx,s),s) +G„_i(y*(xVx,s),s) (23) 
= min {Gi_i(y,s)}+ min {Gi_i(y,s)}, 

ye^''(xAx,s) ye^''(xVx,s) 

for a specific choice of y G ^''(x, s) and y G ^''(x, s). The difficulty is cleverly constructing y 



and y, depending on the relative locations of x, x, y*(x A x), and y*(x V x), so as to ensure ( |23| ) 
is true. 

It follows from Theorem |7] that the structure of the optimal transmission policy for the finite 
horizon discounted expected cost problem is given by the following theorem. 

Theorem 8. For every n G {1, 2, ... , A^} and s G 5^ x S^, define the nonempty set of global 
minimizers of Gn{-,s): 

Bnis) := \y e[d^,oo) x[d'^,oo) : = min s) I . 

Define also 

hi{s) := minj?/^ G oo) : {y^,y'^) G Bn{s) for some G [ci^oo)| , 

and 

hi (s) := mm{y' G oo) : {b\ (s) , y') G . 
Then the vector bn{s) = (b\ (s) ,6^ (s) ) G -B„(s) is a global minimizer ofGn{-,s). Define also 
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the functions: 



fl^{x^,s) := inf < argmin < G„ (7/^, x^, s\ -s^) > > , for G [cP, 00), and 

[y^e[d\oo) { ) J 

fl{x^,s) := inf \ argmin i {x^ ,s^) \ \ , for x^ e [d},oo). 

A^ote ?/ja? construction, f^(b'^{s),s) — bl^{s) and f^(bl^{s),s) — b1{s). Partition 1R\ into the 
following seven regions: 



TZi{n,s 
TZn{n,s 

TZiv-A{n,s 
TZiv-B{n,s 
TZiv-c{n,s 



\^xeRl:xh {flix^,s),f^{x\s)) andx^bnis)^ 
|jc e IR^ : X ^ bn{s) and [bn{s) — Jc] < -P| 

{xeRl:x'> blis) and f'^{x',s) - ^ < x' < 

k Cgi J 

{xeRl:x'> blis) and fl{x\s) - ^ < x' < f',{x\s)} 

<xeMl:x'^> bl{s) and x'^ < fn{x^,s) \ 

|jc e -K^ : JC ^ bn{s) and [bn{s) - jc] > p| 
<^jc e IR\ : x^ > bl^{s) and x^ < f^{x^,s) \ , 



and define lZiy{n,s) := 7^/y_A(n,s) U 7^/i/-B(n,s) U T^/y-cl^;*)- 

Then for Problem (PI) in the case of two receivers with linear power-rate curves, for all 
X ^ TZiv{n,s), an optimal control action with n slots remaining is given by: 



yn\x,s) := < 



(24) 



JC, if X e 7li{n,s) 

bn(s), if X e nn(n,s) 

(^fnix^,s),x^^, if xe TZin-A{n,s) 

(^^/n(^^«))' if X enni-B{n,s) 

For all X e 7liv{n,s), there exists an optimal control action with n slots remaining, j* (jc,s), 
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which satisfies: 



cl\y:Xx,s)-x]=P 



(25) 



A detailed proof is included in Appendix A. Equation (|25j) says that it is optimal for the 
transmitter to allocate the full power budget for transmission when the vector of receiver buffer 
levels at the beginning of slot n falls in region 7?./y(n,s). We cannot say anything in general 
about the optimal allocation (split) of the full power budget between the two receivers when 
the starting buffer levels lie in region TZjv{n,s). Figure [6] shows the partition of IR^ into the 
seven regions, and a diagram of the structure of the optimal transmission policy. Note that the 
figure shows the seven regions of the optimal policy for a fixed realization of the pair of channel 
conditions. Under different pairs of channel realizations, the seven regions have the same general 
form, but the targets b„(s) are shifted and the boundary functions /^(a;^,s) and /^(x\s) are 
different. 



Buffer Level 
of User 2 ^ i 2 
Before Ki^ ) 
Transmission 



/„'(x2 ,s\ .2 )= inf arg min {g„ [y\x\s\s'-} 
l.y'E[rf'.«) 



V r 

"r^^f"'-',, ★ /„'(^'.^'.^') = inf argmin{G„(x 



12 12 
y ,S ,S 



Buffer Level of User 1 Before Transmission 



Fig. 6. Optimal transmission policy for the two receiver case in slot n when the state is (x, s). The seven regions described 
in Theorem [8] are labeled. The tails of the arrows represent the vectors of the receiver buffer levels at the beginning of slot n, 
and the heads of the arrows represent the vectors of the receiver buffer levels after transmission but before playout in slot n 
under the optimal transmission policy. In region TZi{n,s), a single dot represents that it is optimal to not transmit any packets 
to either user. The and ♦ represent possible starting buffer levels for Scenarios 1 and 2, respectively, in Example [T] 



In some sense, the structure of the optimal policy outlined in Theorem [8] can be interpreted 
as an extension of the modified base-stock policy for the case of a single receiver outlined in 
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Theorem [T| Namely, under each channel condition at each time, there is a critical number for each 
receiver (6^(s)) such that it is optimal to bring both receivers' buffer levels up to those critical 
numbers if it is possible to do so (region 7lii{n, s)) , and it is optimal to not transmit any packets 
if both receivers' buffer levels start beyond their critical numbers (region 7lj(n,s)). However, 
this extended notion of the modified base- stock policy only captures the optimal behavior in two 
of the seven regions, and does not account for the coupling behavior between users that arises 
through the joint power constraint. For instance, possible starting buffer levels for Scenario 1 and 
Scenario 2 in Example [T] are illustrated in Figure |6] by the and respectively. Even though 
the buffer level of receiver 2 before transmission is the same under both scenarios, the optimal 
transmission quantity to receiver 2 is different under the two scenarios due to the different 
starting buffer levels of receiver 1. 

B. Structure of the Optimal Policy for the Infinite Horizon Discounted Expected Cost Problems 

In this section, we show that the structure of the optimal stationary (or time-invariant) pohcy 
for the infinite horizon discounted expected cost problem is the same as the structure of the 
optimal policy for the finite horizon discounted expected cost problem. Moreover, the boundaries 
of the seven regions of the finite horizon optimal policy shown in Figure |6] converge to the 
boundaries of the seven regions of the infinite horizon discounted expected cost optimal pohcy 
as the time horizon goes to infinity. 

Theorem 9. Define: 

(i) Voo{x,s) := lim Vn{x,s), for all x E IR\ and s E S (this limit exists). 

n—^oo 

(ii) Goo{y,s) := c^y + h(y - d) + a ■ E[Voo(y -d,S') \ S = s], for ally E [d^, oo) x [rf^ oo) 
and s E S. 

(iii) Boo{s) ■■= \y E [d^,oo) X [d'^,oo) : Goo(y,s) = min Gooiy,s) 

(iv) 6^ (s) := minjyi E [d^, oo) : {y^,y^) E B^ds) for some E [d^, oo)| 
(V) bl (s) := min|y2 e [d\ oo) : {b^ (s) , y') E B^{s)}. 
(vi) b^s) := {bl{s),bl{s)). 
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(vii) The functions 



fl^{x'^,s) := inf < argmin < Goo (l/^,^:^, s^, s^) > > , for G [d? , oo), and 

[yie[di,oo) I J J 

:= inf < argmin <^ Goo \ \ , for e [d\oo). 

[s/2e[d2,oo) I J J 

(viii) r/?e ^even regions 7^/(oo,s) — 7^/y_c(oo,s), defined in the same way as in Theorem^ with 
n replaced by oo. 

Then 

(a) Voo{x,s) satisfies the a-discounted optimality equation (a-DCOE): 



Vooix,s) = mill 



, yxeRl,yseS. (26) 



cl[y-x]+h(y-d) 
yeAHx^s) ^ +a ■ lE[Voo(y - d, S') \ S = s] 

(b) An optimal stationary policy for Problem (P2) in the case of two receivers with linear 
power-rate curves is given by tt^ = (j^jj^, ■ ■ ■), where 



yloix,s) := <^ 



jc, if X e 7^/(00, s) 

boois), if X e 7^//(oo,s) 

if x e TZin^Aioo^s) 
if X e'JZiii^B{oo,s) 

and for all x e 7^/y(oo,s), exists an optimal control action, which satisfies: 



c-\yUx,s)-x] = P . 



(c) lim bn{s) = boois) for all s E S. 

(d) lim /^(a;^,s) = f^{x^,s) for all x"^ G [rf^, 00) anJ s e S. 

(e) lim /^(a;^,s) = /^^(x-'^,*) /or a// a;^ G [rf-*^, 00) anJ s G 5. 

n— >oo 

A detailed proof of Theorem |9] is included in Appendix B. 
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C. Structure of the Optimal Policy for the Infinite Horizon Average Expected Cost Problems 

In this section, we again use the vanishing discount approach to show that the structure of the 
optimal policy for the finite horizon expected cost and infinite horizon discounted expected cost 



problems extends to the infinite horizon average expected cost problem. As in Section |IV-D[ we 
make explicit the dependence of the value functions and optimal policies from the corresponding 
infinite horizon discounted expected cost problem on the discount factor, a. 

Theorem 10. For all a G [0, 1), define: 



inf Voo,a{x,s), 
lim(l — a) ■ m^^ai cind 

a /^l ' 

Voo,ai^,s) - moo,a, Vx G iR+, Vs G S. 



(27) 

(28) 
(29) 



Then: 

(a) There exists a continuous function Woo,i{-, ■) and a selector yl^ ^(■, ■) that satisfy the ACOE: 



_ , . . , c-\y-x]+hiy-d) 

yeA''{x,s) 1^ +E[w^,iiy~d,S') \ S = s] 

yloAx,s) -X +h(^y*^-^{x,s) -d 

Woo,i(y*oo,i{x,s) -d,S' 



(30) 



+ E 



, Vjc G iR+, Vs G S. 



(b) The stationary policy tt^ ^ = (y^ i,^^^ i, • • •) is optimal for Problem (P3) in the case of 
two receivers with linear power-rate curves. 

(c) The resulting optimal average cost beginning from any initial state (jc,s) G x S is p*. 

(d) For every increasing sequence of discount factors {tt(0}«=i,2,... approaching 1, there exists 
a subsequence {a(/i)}i=i,2,... approaching 1 such that: 



w^^i{x,s) = lim Woo,a{k){x,s), Wx G Rl, Vs G S. 
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Therefore, for every s E S, Wao^i{x^s) is convex and supermodular in x. 

(e) For every (jc,s) G -K^ x S and increasing sequence of discount factors {«(0}«=i,2,... 
approaching 1, there exists a subsequence {a(^i)}i=i,2,... approaching 1 and a sequence 
{x{i)}i=ix... approaching X such that: 

yl.^(x,s) = Wmy* ^^^ ^{x{i),s) . 

(f) There exists an optimal stationary policy with the same structure as statement (b) in Theorem 

m 



A detailed proof of Theorem 10 is included in Appendix C. 



D. Discussion 

At first glance, the structure of the optimal policy described in Theorem [8] may also seem 
analogous to the structure of the optimal policy for the two-item resource-constrained inventory 
problem with deterministic prices and stochastic demands (i.e., the reverse of our problem), 
originally studied by Evans in [60], and revisited in [61]-[63]. The structure of the optimal 
control action at each time for that problem can also be described in terms of seven regions that 
look essentially the same as those shown in Figure 6|^" However, there are two fundamental 
differences that distinguish these two problems. 

First, the function G'„(-) in the deterministic price and stochastic demand inventory problem 
that corresponds to our function G'„(-,s) has an additional structural property that Chen calls 
jji-difference monotone [62]. This property is equivalent to the function G'„(-) not only being 
supermodular, but also submodular with respect to a partial order introduced by Antoniadou in 
[67], [68] called the direct value order (see [69] for further details). This functional property 
leads to two additional structural results on the optimal control action: (i) when the initial vector 
of inventories (corresponds to the vector of receivers' buffer levels in our problem) is in region 
TZiv-B{n), there exists an optimal control action such that yj!i(x) < b„; and (ii) when the initial 

'"in the case of deterministic prices and stochastic demands, the boundaries of the regions do not depend on the ordering 
price (corresponding to the channel conditions s in our case), because the vector of ordering prices is deterministic. 
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vector of inventories is in region TZjy_Ain) (respectively, 7^/y-c(^))j there exists an optimal 
control action that includes not ordering any of item 2 (respectively, item 1), corresponding to 
not transmitting any packets to user 2 (respectively, user 1) in our problem. Due to the time- 
varying channel conditions, this property does not hold for our function (?„(■, s), and these 
two additional statements on the structure of the optimal policy are not true in general for our 
problem, as shown by the following example. 

Example 2. Consider a single sender transmitting to two statistically identical receivers, whose 
channel conditions are IID over time and independent of each other The power-rate curves 
are linear, and the possible per packet power costs are 1.750 (best possible channel condition), 
2.000, 2.001, and 2.100 (worst possible channel condition). The associated probabilities of each 
user experiencing these channel conditions are 0.4, 0.4, 0.1, and 0.1, respectively. The total 
power constraint in each slot is P = 4.2, and 1 packet is removed from each receiver's buffer 
at the end of each time slot (i.e., d = (1, 1)). We consider a finite horizon problem with the 
discount rate a = 1, and no holding costs. We are interested in the optimal control action with 
T = 3 time slots remaining, and the current channel conditions are such that it costs 2.000 units 
of power to transmit a packet to user 1, and 2.001 units of power to transmit a packet to user 
2. 

Exactly solving the dynamic program shows that the unique global minimizer of the function 
Gz{-,-,s^) is the vector However, if the vector of starting receiver buffer levels at 

time T = 3 is Xs = (0.2, 0.2), the unique optimal scheduling decision in the slot is to transmit 
0.8 packets to user 2, and use the remaining power for transmission to user 1, which results 
in 1.2996 packets being sent to user 1. A diagram of this optimal control action is shown in 
Figure The interesting thing to note here is that despite being power-constrained ( the vector 
of starting buffer levels is in Region 71iv-b)> the unique optimal scheduling decision calls for 
filling user I's buffer beyond its critical number h\{sz) = That is, the optimal scheduling 
decision brings the buffer levels from Region TZiv-b to Region 1Zui-b rather than Region TZjj. 

The second fundamental difference is also a consequence of the time-varying channel con- 
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Buffer Level 
of User 2 
Before 
Transmission 




Buffer Level of User 1 Before Transmission 

Fig. 7. Optimal scheduling decision with 3 slots remaining in Example |2] The action space is represented by the triangle 
^""(xs, S3). The critical vector b3(s3) is not reachable from the starting buffer levels X3 = (0.2, 0.2). The unique optimal control 
action is to choose y3(x3,S3) (the buffer levels after transmission but before playout) to be (1.5, 1.0). The interesting feature of 
the example is that even though X3 ^ b3(s3), we have y3(x3,S3) ;^ b3(s3). 

ditions in our model. In the infinite horizon version of the two-item inventory problem with 
deterministic prices and stochastic demands, the critical numbers are time-invariant. Combined 
with the above property that it is optimal to not order inventory so as to move out of regions 
TZji and TZjv-b, the time-invariant critical numbers mean that the region TZjj U TZjv^b (i-C-^ 
the lower-left square below the critical vector) is a "stability" region. Eventually, the vector of 
inventories enters this region under the optimal ordering policy, and once it does, it never leaves. 
This behavior both simplifies the analysis and opens the door for new mathematical techniques, 
such as analyzing shortfall to compute the critical numbers [54], [63]. In our Problems (P2) and 
(P3), even though the boundaries of the seven regions for each possible channel condition are 
time-invariant, no such stability region exists, because the critical numbers vary over time due 
to the time-varying channel conditions. This makes it significantly more difficult to determine 
optimal and near-optimal policies. 
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VI. Extensions 

In this section, we discuss the relaxation of the strict underflow constraints and the extension 
to the general case of M receivers. 

A. Relaxation of the Strict Underflow Constraints 

In some applications, it may not be the case that the peak power per slot is always sufficient to 
transmit one slot's worth of packets to each receiver, even under the worst channel conditions. 
In this case, a more appropriate model is to relax the strict underflow constraints, and allow 
underflow at a cost. One way to model this situation is to allow the receivers' queues to be 
negative, with a negative buffer level representing the number of packets that the playout process 
is behind. Then, in addition to the holding costs assessed on positive buffer levels, shortage 
costs are assessed on negative buffer levels. With some minor alterations to the proofs, it is 
straightforward to show that as long as the shortage cost function is a convex function of the 
negative buffer level, the structural results of Theorems [T||3] and [8] are essentially unchanged by 
the relaxation of the strict underflow constraints to loose underflow constraints with penalties on 
underflow. This is not too surprising as the strict underflow constraint case we consider can be 
thought of as the limiting case as the penalties on underflow go to infinityp] 




B. Extension to the General Case of M Receivers 

Our ongoing work includes examining the extension to the most general case of M receivers. 
It is unlikely that the structure of the optimal policy in this case has a simple, intuitive, and 
implementable form. Therefore, our approach is to find lower bounds on the value function and 
a feasible policy whose expected cost is as close as possible to these bounds. One simple lower 
bound to the value function can be found by relaxing the per slot peak power constraint of P 
units of total power allocated to all users, and allowing up to P units of power to be allocated 

"Tracking the number of packets that the playout process is behind in this manner corresponds to the complete backlogging 
assumption in inventory theory. An alternate model is to say that a packet is of no use once it misses its deadline, penalize 
missed packets, and keep the receiver queue length at zero. This model corresponds to the lost sales assumption in inventory 
theory. 
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to each receiver in a single slot (for a total of up to M ■ P). The advantage of this technique 
is that it is easy to compute the lower bound, as the M-dimensional problem separates into M 



instances of the 1 -dimensional problem we know how to solve from Section III However, the 
resulting bound is likely to be loose. A second lower bounding method we are investigating is the 
information relaxation method of Brown, Smith, and Sun [70]. The main idea there is to assume 
the scheduler has access to future channel states (corresponding to the non-causal or offline 
model often considered in the literature), but penalize the scheduler for using this information. 
A clever choice of the penalty function often leads to tight lower bounds on the value function. 
A third method is the Lagrangian relaxation method discussed in [65], [66]. For our problem, 
this method is equivalent to relaxing the per slot peak power constraint to an average power 
constraint (i.e., the scheduler may allocate more than P units of power in some slots, but the 
average power consumed per slot over the duration of the horizon cannot exceed P). Like the first 
method we mentioned, the resulting relaxed problem under this method can be separated into M 
instances of a 1 -dimensional problem, this time with an average power constraint of instead 
of a strict power constraint of P for each receiver. A fourth lower bounding method is the linear 
programming approach to approximate dynamic programming discussed in [66], [71], and [72]. 
The idea there is to formulate the dynamic program as a linear program, and approximate the 
value functions as linear combinations of a set of basis functions. For a more in-depth comparison 
of the Lagrangian relaxation and approximate linear programming approaches, see [66]. Once 
lower bounds to the value function are determined from any of these methods, feasible policies 
can be generated based on our structural results or via one-step greedy optimization with the 
lower bounds substituted into the right-hand side of the dynamic programming equation. 

These same numerical techniques are most likely also the best way to approximate the 
boundaries of the seven regions of the two receiver optimal policy, and determine a near-optimal 
split of the power P between the two receivers when the vector of starting receiver buffer levels 
is in the power-constrained region 7ljv{n,s). 

The results we have presented in this paper are useful not only in terms of the intuition they 
provide, but also in generating feasible policies for the most general case of M receivers and 
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solving subproblems resulting from the relaxation methods described above. 

VII. Conclusion 

In this paper, we considered the problem of transmitting data to one or more receivers over 
a shared wireless channel in a manner that minimizes power consumption and prevents the 
receivers' buffers from emptying. We showed that under the finite horizon discounted expected 
cost, infinite horizon discounted expected cost, and infinite horizon average expected cost criteria, 
the optimal transmission policy to a single receiver under linear power-rate curves has a modified 
base-stock structure. When the power-rate curves are generalized to piecewise-linear power-rate 
curves, the optimal transmission policy to a single receiver has a finite generalized base-stock 
structure. For the special case when holding costs are linear, the stochastic process representing 
the channel condition evolution over time is IK), and the maximum number of packets that can 
be transmitted at any given marginal power cost in a slot is an integer multiple of the drainage 
rate of the receiver's buffer, we presented an efficient method to compute the critical numbers 
that fully characterize the modified base-stock and finite generalized base-stock policies. 

We also analyzed the structure of the optimal transmission policy for the case of two receivers. 
In some sense, the structure of the optimal policy was shown to be an extension of the modified 
base- stock policy; however, the peak power constraint couples the optimal scheduling of the 
two data streams, and the time-varying channel conditions may result in counterintuitive optimal 
scheduling decisions that are not possible in the analogous inventory theory problems. 

The extension to the most general case of M receivers is quite complex, and it is likely that 
numerical approximation techniques need to be used to develop further insights on the nature 
of the optimal policy. We presented a few possible approaches that constitute ongoing work in 
that regard. 

VIII. Appendix A - Finite Horizon Proofs 

A. Proof of Theorem [7] 

Before proceeding to the proof of Theorem [1} we present a lemma due to Karush [73], which 
is presented in [74, pp. 237-238]. 
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Lemma 1 (Karush, 1959). Suppose that f : M ^ M and that f is convex on M. For v < w, 
define f{v,w) := min f{z). Then it follows that: 

z&[v,w] 

(a) / can be expressed as f{v,w) = Fi{v) +F2{w), where Fi is convex nondecreasing and F2 
is convex nonincreasing on JR. 

(b) Suppose that S is a minimizer of f over M. Then f can be expressed as: 



f{v,w) 



f{v), ifS<v 
f{S), ifv<S<w 
^ f{w), ifw<S 



Proof of Theorem [7]- We present the proof in three parts. 

Part I - Modified Base-Stock Structure: Recall the dynamic programming equation Q: 

Vn{x,s) = -Cs- X + min {gn{y, s)} , n = N,N - I, . . . ,1 , 

max(a;,d)<j/<a'+^ 

where Qniy, s) := Cg ■ y + h{y — d) + a ■ IE \Vn-i{y — d, Sn-i) \ Sn = s]. We now show by 
induction on n that the following statements are true for every nG{l,2,...,A^} and all s G 5: 

(i) gniy, s) is convex in y on [d, 00). 

(ii) limy^^gn{y,s) = 00. 

(iii) Vn{x, s) is convex in x on 

Base Case : n = 1 

Let si G 5 be arbitrary. We have gi{y, si) = Cg^ ■ y + h(y — d), which clearly satisfies (i) and 
(ii). yl{x, Si) = max(x, d) and thus Vi{x, si) = ■ {d — x)+ + h(^{x — d)^^, which is convex 
in X. We conclude (i)-(iii) are true at time n = 1, for all s G S. 

Induction Step: We now assume (i)-(iii) are true for n = m — 1 and all s E S, and show they hold 
for = m and an arbitrary s„i G S. Let Sm-i G S also be arbitrary. Vm-i{y — d, Sm-i) is convex 
in y, so gm{y, Sm) is convex in y as it is the sum of an affine function, c^^ - y, a convex function, 
h{y — d), and a nonnegative weighted sum/integral of convex functions, a- E[Vm-i{y — d, Sm-i) \ 
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Srn = Sm] (scc, c.g., [75, Scction 3.2] for the relevant results on convexity-preserving operations). 
To show (ii) for n = m, we have lim gm{y, Sm) > lini c^^ ■ y = oo, where the inequality follows 
from Vjn-i{x, Sm-i) > 0, Vx G iR+, Vs^-i G S and h{y — d) >0. Moving on to (iii), we have: 

Vm{x,Sm) = -Cs^-X+ mm {gm{y,Sm)} 

max(a;,c/)<;y<xH — — 

P 

= -Cs^ ■ X + Fi{ma.x{x,d)) + F2{x -\ ), 

where, by Lemma [T| Fi is convex nondecreasing and F2 is convex nonincreasing. Fi(max(x, d)) 
is also convex in x, as it is the composition of a convex increasing function with a convex 
function, and Vm{x,Sm) is therefore convex in x. This concludes the induction step, and we 
conclude (i)-(iii) are true for all n G {1, 2, ... , A^}. 
Next, we define the critical numbers bn{s) for all n G {1,2,..., N} and s G 5: 

bn{s) := mm \y e[d,oo) : gn{y,s)= mm gn{y,s) 

i/e[rf,oo) 

Note that by properties (i) and (ii) from the above induction, the minimum of gn{-, s) over [d, 00) 
is achieved, and the set of minimizers over [d, 00) is a nonempty closed, convex set. Thus, b„{s) 
is well-defined. The form of s), then follows from part (b) of Karush's result. Lemma 
|T| with gniy, s) playing the role of /, max{x,d) the role of v, x + f- the role of w, and bn{s) 
the role of 5*. 

Part II - Monotonicity of Thresholds in Time: In this section, we prove (|7]). We showed 
above that the optimal action with one time slot remaining is y{{x^ s) = max(x, d), for all s E S. 
This is precisely the policy suggested by (jsj) with bi{s) = d, as f- is at least as great as d. Thus, 
we conclude the far right equality in ^ holds: foi(s) = d, Vs G S. 

In order to show the far left inequality in (jv]), we claim more generally that &„(s) < n ■ d, for 
all n and s. This follows from a simple interchange argument, as all packets transmitted beyond 
n ■ d incur transmission costs and holding costs for the duration of the horizon; however, they do 
not satisfy the playout requirements in any remaining slot. Thus, a policy that transmits enough 
packets to fill the buffer up to n ■ c/ at time n is strictly superior to a policy that transmits more 




Febniary 16, 2010 



DRAFT 



52 



packets. 

Next, we prove: 



bn+i{s) > Vs e 5, Vn e {1, 2, . . . , iV - 1} . (31) 



By Topkis' Theorem 2.8.1 [76, pg. 76], in order to show pT] ), it suffices to show that for all 

s e S, n e {1,2, . . . ,N - 1}, and y^,y^ e [d, (n + 1) ■ d], > y'^ implies: 

Qn+l S) - gn {y^, S) < Qn+l (y^ s) - Qn s) . (32) 

We let s G 5 be arbitrary, and proceed by induction on the time slot n. 
Base Case : n = 1 
For all y E [d,2d], 

92iy,s) - giiy,s) = a ■ E [Vi{y - d. Si) \ S2 = s] 
= a-E[cs,\S2 = s]-{2d-y) , 

which is decreasing in y as iE [c5jS'2 = s] > 0. 

Induction Step: We assume that ([32j) is true for all n = 1, 2, . . . , m — 1 and s E S. We wish to 
show it is true for n = m. Let y^,y'^ E [d, (m + 1) ■ d] be arbitrary, with y^ > y^- Also, let s G iS 
be arbitrary. Define: 



(3i := mm{ argmin {9m~-i{y,s)} 

max{y^ —d,d)<y<y^ —d+ — 
I iriax{y'^ — d,d)<y<y-^—d+-^ 

Note that: 

p 

max [y^ -d,d) < Pi < PiV < y^ - d + — , and (33) 

p 

max {y^ - d,d) < I3i A 132 < I32<y^ - d+ — . (34) 
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mill {gmiy,s)}- mill {gm^i{y,s)} 

max(j/i— d,d)<j/<?/i— c(+^ nvAx{y^ —d,d)<y<y^ — d+ ^ 



< Qra V /32, S) - Qm^i s) 

< Qra (/32, S) - Qm-l A /32, s) 



< min {gmiy.s)}- min {5f^_i(y, s)} 

nia,yi{y'^ — d,d)<y<y'^ — d+ ^ ma,x{y'^ — d,d)<y<y'^ — d+ ^ 



(35) 
(36) 
(37) 



Equation ([35]) follows from p3] ) and ( [37] ) follows from (J34]). If > ( ]36] ) holds with equality. 
Otherwise, it follows from the induction hypothesis. Since s was arbitrary, ( ]37] ) holds for all s E 
S. Therefore, combined with the fact that the Markov process {Sn}n=N n-i i homogeneous, 
(]37]) implies: 



IE 



mm 

_max{y^ — d,d)<y<y^ —d-\ — — 

Srr, 



- IE 
< IE 

- IE 



mm 



{dm {y, Sm)} I Sm+1 — S 

{fi'm-1 {y, Sm-l)} I Sm = S 



max{y^ —d,d)<y<y^ —d+ — 



min {gm {y, Sm)} \ S^+i = s 



(38) 



mm 

_niax{y-^ —d,d)<y<y^ —d-\ — 



{gm-1 {y, Sm-l)} \ Sm = S 



Finally, we have: 



gm+iiy\s) - gmiy^,s) 

= a- E [Vmiy^ - d, 5'„)|S'„+i = s] - a ■ E [Vm-iiy^ - d, Sm-i)\Sm = s] 



a ■ E 



mm 



max{y^ —d,d)<y<y^ —d^ - 



-a- E 



mm 



max{y^ — d,d)<y<y^—d+- 



{9m {y, Sm)} I Sm+1 — S 



{Om-l {y, Sm-l)} I Sm — S 



(39) 
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<a-E 



mm 



max{y'^ — d,d)<y<y'^ —d+ - 



{dm 'S'm)} I Sjyi+l — S 



-a- IE 



mm 

_max(y2_(i,d)<y<y2_£;_|_ _P_ 



{gm-1 {y, Sm-l)} I S„ 



a ■ E [Vm{y'^ - d, S'„)|5',„+i = s] - a ■ E - d, 5'„_i)|5'„ = s] 

5'm+l(^/^■5) - g.miy'^,s) . 



(40) 
(41) 



Here, ([39]) and ( |4T] ) follow from the fact that E [cs^,_^ \ = s] = E [cs^ \ S^+i = s], and 
( |40l ) follows from ( |38] ). This completes the induction step, and the proof of Q. 

Part III - Monotonicity of Thresholds in the Channel Condition: Finally, we show ([8]), 
the monotonicity of the thresholds in the channel condition, when the channel condition process 
is IID. The far left inequality follows from the same interchange argument described above, 
showing hn{s) <n-d for all s and n. We now show the far right equality of ([S]), &„(s„orst) = d. 
To satisfy feasibility, we must have hn{s) > d for all n E {1,2, .. . , A^} and s E S. To see 
that &n(sworst) < d, assume the channel condition at time n is s^ovsu and consider two control 
policies satisfying with the same critical numbers 6m(s), for all times m < n. At time n, 
the first policy, tt^, transmits according to ([s]), with critical number 6„(sworst) = d + e (e > 0), 
and the second, tt^, transmits according to with critical number 6„(sworst) = d. These two 
strategies result in the same control action at time n if Xn > d + e, and we have already shown 
it is not optimal to fill the buffer beyond n ■ d, so we only need to consider the case where 
Xn < d + e and e < {n — l)-d. Let Z^, Zl_^, . . . ,Z\ and Z\, Z\_^, . . . , Zf he random variables 
representing the number of packets transmitted at times n, n—1, . . . , 1 by tt^ and tt^, respectively. 
If d < Xn < d + e, then Z?, = and Z}. — Z^ = Zl = min < d + e — Xnf ■ If Xn < d, then 

Zl = d — Xn, Zl = min < — d + t — Xn\, and Zl — Z"^ = min < — d + x„, e \ . Thus, for 

all Xn < d + e, we have Z^ — Z^ > 0. If Z^ — Z^ = 0, the two control policies result in the same 
actions for all remaining times, and therefore result in the same expected cost. So we only need to 
consider the case where A := Z^—Z^ > 0. Because the critical numbers at times n—1, n—2, . . . , 1 
are the same for both policies, for any realization, cu, of the channel condition over future times. 



Febniary 16, 2010 



DRAFT 



55 



we have ^^(w) < Z^(u}), Vm G {n — 1, . . . , 1}. Moreover, because the scheduler must satisfy 
the playout requirements for the last n slots, we have X]m=\(^m('^) ~ ^m('^)) = -^^ i-^-' o^^r the 
remainder of the horizon, an extra A packets are transmitted under the second policy. The total 
discounted holding costs from time n until the end of the horizon are therefore lower for tt^ 
than TT^, because the number of packets remaining after transmission in each slot is never greater 
under policy tt^. Furthermore, the total discounted transmission costs of the extra A packets are 
also lower for tt^ as they are transmitted at the maximum cost Cmax under tt^, and transmitted 
later (and therefore discounted more heavily) under tt^. Thus, the total discounted transmission 
plus holding costs are lower for tt^ under all realizations, and the expected discounted cost of 
TT^ is lower than tz^. We conclude &n(sworst) = d. 

To show Cgi < Csp. implies hj^s^) > hn{s^), we follow Kalymon's methodology for the proof 
of Theorem 1.3 in [46]. For all y E [d, oo), we have: 

gn {y, s^) = c,2 ■ y + h{y - d) + a ■ E [K-i {y - d, Sn-i)] 

= (c^2 - c^i) ■ y + Csi ■ y + h{y - d) + a ■ E [K-i {y - d, Sn-i)] 

= {cs2 - Csi) - y + gn{y,s^) . (42) 

Assume bn{s^) < hn{s'^) for some n G {1, 2, . . . , A^} and s^, G S, with c^i < 0^2. Substituting 



first y = bn{s^) and then y = 6„(s^) into (42) yields: 



> gn {bn (S^) , S^) 

= (c,2 - c,i) ■ bn (s^) + gn {bn (s^) , s^) . (43) 

Yet, Cgi < and &„(s^) < imply: 

(Cs2 -Csi) ■ bn (s^) < (c^2 - c^i) ■ bn (s^) . (44) 
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Equations ([43|) and ([44]) imply: 



which clearly contradicts the fact that 6„ (s^) is a global minimizer of (?„(■, s^). We conclude 
that c^i < implies > completing the proofs of ^ and Theorem [T| □ 

5. Proof of Theorem |5] 

While the proof is similar in spirit to the proof of a finite generalized base-stock policy 
in [56, pp. 324-334], some key differences include the introduction of (i) stochastic channel 
conditions (ordering costs); (ii) the underflow constraint x + z > d; and (iii) the power constraint 

^ — ^max("S)- 

We show by induction on n that the following two statements are true for every n E 
{1,2, ...,A^} and s e S: 

(i) Vn{x, s) is convex in x on IR_^.. 

(ii) There exists a nonincreasing sequence of critical numbers {bn,kis)} ^ j^y such that 
the optimal control action with n slots remaining is given by: 

Zk-l{s), if bn,k{s) - Zk-lis) <X< 6„,fc_l(s) - Zk-lis) , 

A;e{0,l,...,i^} 

&n,fc(s) - X, if 6„,fc(s) - Zkis) <x < bn,kis) - 4-1 (s) , 

ke{0,l,...,K -1} 

bn,K{s) - X, if bn,K{s) - 5max(s) <X< bn,K{s) - ZK-i{s) 

if < a; < 6„,k(s) - 5max(s) 



z^[x,s) := < 



(45) 



Base Case: n = 1 



Vi{x,s) = min {c{z , s) + h{x + z — d)} 

max(0,(i— 2:)<Z<2niax(s) 

= c(ma.x{0,d — x} , s) + h(ma.x{0,x — d}) , 



(46) 



which is convex because c(-,s) and h{-) are both convex and nondecreasing functions, and 
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max{0,(i — x} and max{0,a; — d} are both convex functions (see, e.g., [75, Section 3.2] for 
the relevant results on convexity-preserving operations). Further, let 6i _i(s) = oo and 6i,fc(s) = d 



for all k G {0, 1, ... , K}. Then (45 1 is equivalent to zl{x, s) = max{0, d — x}, which clearly 
achieves the minimum in (|46l). 



Induction Step: We now assume (i)-(ii) are true for n = m — 1 and all s E S, and show they 
hold for n = m and an arbitrary s E S. Let x, x G 1R+ and 6 G [0,1] be arbitrary, and define 
X := 9 ■ X + (1 — 9) ■ X. We have: 

Vrn{9 ■ x + (1- 9) ■ x,s) 

= Vmix,s) 

= min _ {c(-2, s) + h{x -\- z — d) -\- a ■ E [Kn-i(a; + z — d, Sm-i) \ Sm = } 

max(0,(i— a;)<2;<2max(s) 



< min 

max{0,d— £}<Z<2max(s) 
max{0,d— £}<Z<2max(s) 



c{9-z + {l-9)-z,s) + h{x + 9-z + {l-9)-z-d) 



-a 



E[Vm-i{x + 9-z + {l-9)-z-d, \S^ = s\ 



(47) 



< min 

max{0,d— x}<2<2niax(s) 
max{0,(i— £}<2<2niax('5) 



9-c{z,s) + {l-9)-c{z,s) + 
9 ■ h{x + z - d) + {1 - 9) ■ h{x + z - d) 
+a - 9 ■ E [Vm^i{x + z -d, S^^^i) \ Sm = s] 
+a ■ (1 - 9) -ElVm-iix + z-d, Sm-i) \Sm = s] 

{c{z, s) + h{x + z — d) 
+a ■E[Vm-i{x + z-d, Sm-i) \Sm = s] ^ 

{c(z, s) + hix + z — d) 
+a-E[Vm-i{x + z-d,Sm-i) \Sm = s\ 
= 9-Vm{x,s) + {l-9)-Vm{x,s) , 



(48) 



where (48) follows from the convexity of c(-,s), h{-), and E\Vm-i{--,Sra-i) \ Sm = s\, the 
last of which follows from the induction hypothesis. Equation ( [47] ) follows from the fact that 
for every max{0,(i — x} < z < 5max('S) and max{0,(i — x} < z < -2max('S), there exists a 
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max {0, d — x} < z < Zj^axis) (namely, z := 9 ■ z + {1 — 9) ■ z) such that: 

c{z, s) + h{x + z — d) + a ■ IE\ym-i{x + z — d, Sm~i) \ = s] 
= c{9-z + {l-9)-z,s) + h{x + 9-z + {l-9)-z-d) 
+ a-lE[Vm^^{x + 9-z + {l-9)-z-d,Sm-i) \ = s] . 

This concludes the induction step for (i) and we now proceed to (ii). 

Note first that gm{y, s) = h{y — d) + a ■ IE \Vm-i {y — d, Srn-i)\Sm = s] is convex in y, as 
h{-) is convex, and Vm-i{x, s) is convex in x for every s G iS by the induction hypothesis. Let 

bm-i{s) := oo and 

&m,fc(s) := max|ci,inf{6 | g^{b, s) > -Cfc(s)}| , E {0,1, . . . , K} , 
where g'^{b,s) represents the right derivative: 



9rniP,s) := lim- 



yib y — b 

which is nondecreasing and continuous from the right, by the convexity of gm{-,s) [77, Sec- 
tion 24]. Note that {^m,fc(s)}j.g|^Q^ ^1 is a nonincreasing sequence, because the sequence 
{cfc(s)}fcg|oi K} nondecreasing. We show the optimal control action z^{x,s) is then given 
by ( |45] ), by considering the four exhaustive cases. 

Case 1 : 6m,fc(s) - Zk-i{s) <x< bm,k-i{s) - Zk-i{s) , k e {0,1, . . . , K} 



In order to show z^{x, s) is given by (45), it suffices to show: 



First, let z G 



c'^{z, s) + g'^{x + z, s) < , for max{0, d — x} < z < Zk-i{s) , and (49) 
c'~^{z, s) +g'^{x + z,s) >0 , for < z < z^^^{s) . (50) 

max{0, d — x}, Zk-i{s) ) be arbitrary, and let j G {0, 1, . . . , /c — 1} be such that 



z G 



i{s),Zj{s)). If hm,k-iis) = d, then bm,kis) = d, as d < brn,k{s) < bm,k-i{s) = d. Yet, 



i,kis) = bm,k-i{s) = d implies d — Zk-i{s) < x < d — Zk-i{s), which is vacuous. Therefore, 
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we need only consider bm,k~iis) = inf|6 | g'^{b,s) > — Cfc_i(s)}. By the construction of the 
piecewise-linear function c(-,s), z < Zk-i{s) implies: 



(51) 



We also have: 



X 



+ z <x + Zk-i{s) < bm,k-i{s) = inf {6 | g'J^{b, s) > -Cfc_i(s)} 



which implies: 



g'+ [x + z,s) < -Ck-i[s} 



(52) 



Summing (|5T| and (|52]) yields ( |49| ). 

Next, let z G [5fc_i(s), 5max('S)] be arbitrary, so that by construction of c(-,s) 

C'+(2,S) > Cfc(s) . 



(53) 



We also have: 



X 



+ z>x + Zk-lis) > bm,k{s) > inf{6 | g'^{b, s) > -Cfc(s)} 



which, in combination with the nondecreasing nature of (-, s), implies: 



9m {x + z,s) >g'^(ini{b \ g^{b,s) > -Cfc(s)},s 



Because (•, s) is continuous from the right. 



(inf{& I gmib,s) > -Cfc(s)},s) > -Ck{s) 



Combining (54) and (55), and summing with ( [53] ) yields (50). 

Case 2 : - Zk{s) <x< bm,k{s) - 5fe_i(s) , /c G {0, 1, . . . , - 1} 



(54) 



(55) 
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In order to show z^{x, s) is given by (45), it suffices to show 



c'~^{z, s) + g'f^{x + z, s) < , for max{0, d — x} < z < &m,fc(s) — x , and (56) 
c^{z, s) + gj^{x + z,s) > , for bm,k{s) - x < z < z^^^{s) . (57) 



max{0, d — x}, bm,k{s) — x ) be arbitrary. This case is vacuous if bm,k{s) = d, 



First, let z G 

so b^^k{s) = inf{6 | ^^(&, s) > -Ck{s)}. Thus, we have: 

x + z < bra,kis) = inf{6 I g'J^{b, s) > -Cfc(s)} , 

which implies: 

g'^{x + z,s) < -Cfc(s) . (58) 

Furthermore, from z < bm,k{s) — x < 5fc(s) and the construction of the piecewise-linear function 

c{-,s), 

c'+(^,s)<4(s) . (59) 



Summing (|58]) and yields ([56]). 



Next, let z G [&m,fc(s) — x, -2max(s)] be arbitrary, so that z > bm,k{s) — x > Zk-i{s), which by 
the construction of the piecewise-linear function c(-,s) implies: 

c'+{z,s)>Ck{s) . (60) 

We also have x + z > bm,k{s) > mf{b | ^'+(6, s) > — Cfc(s)}. Therefore, because g'^{-,s) is 
nondecreasing and continuous from the right, 

g'J^ {x + z,s)> ~C (inf {& I s) > -dk{s)},s) > -Ck{s) . (61) 



Summing (60) and pn yields (57) 



Case 3 : bm,K{s) - 5max(s) <x< bm,K{s) - zk^i{s) 

This case is the same as Case 2, with K in place of k, and z^s^xi-s) in place of Zk{s). 
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Case 4 : < a; < bm,K{s) - 5max(s) 

Let z E [max{0, d—x}, 5max(s)) be arbitrary. Zmax(s) > dhy assumption, so this case is vacuous 
if bm,K = d. Thus, we have hm.K{s) = inf {6 | g'^{h^ s) > — cx(s)}, which, in combination with 

X + Z <X + 5max < bm,K{s), implicSI 

g'+{x + z,s) <-ck{s) . (62) 
Additionally, z < 5max('S) implies: 

c'+(^,s) <cx(s) . (63) 

Summing (|62]) and ( [63] ) yields c'^{z, s) (x + 2;, s) < for all z G [max{0, d — x}, 5max(s)), 
which implies s) = Zmax 

C. Proof of Theorem |?] 

We proceed in a manner similar to [52], incorporating the per slot peak power constraints and 
the relaxing the linear ordering costs to piecewise-linear convex ordering costs. Before proving 
Theorem |2| we state and prove two lemmas. Let tt be a strategy that prescribes transmitting 
according to ( fTT] ). 

Lemma 2. If ir is optimal for periods m — 1,171 — 2, ... ,1, then 

a ■ E [Vi., {{r-l)-d + r],S)- Vi_, ((r - I) . d, S)] > -r]- {li,r+i + h) , (64) 
for all {I, r,ri) e Zi := {{l,r,r]) e IN X IN X [0,d] : 1 < I < m,l < r < I}. 

Proof. We proceed by induction on /. 
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Base Case : / = 1 

/ = 1 implies r = 1, so we have: 



a 



E [Vi^,{{r -l).d + r],S)~ ((r - 1) ■ d, S)] 
= a-lE[Vo{v,S)~Vo{0,S)] 
= 

> — 1] ■ h 

= -V (7i,2 + h) , 



and we conclude (64) holds for / = 1. 



Induction Step 



Assume (64) is true for / = 2,3, ... ,t and all r and rj such that {l,r,ri) G Zi. We show (64) 
is true for / = t + 1 by letting r and 7] be arbitrary such that (t + l,r, r/) G Zi. Note that 
{t + l,r,r]) E Zi implies t < m — 1, so tt is optimal at time t, and we have: 



a 



E[Vt{{r-l)-d + v,S)-Vt{{r-l)-d,S)] 



a-p{s)-[h-r] + a-lE[Vt.,{{r-2)-d + 7],S) -Vt-i{{r-2)-d,S)] 

{s: bt,o{s)<{r-l)-d} 

~ a -pis) ■ {ri ■ gfc(s)) 

K-l {s: (r-l+Lfc_i(s)) •d<fet,fc(s)< (r-l+Lfc(s)) -d} 



fc=0 



h ■ 7] + a ■ IE 



+ Y a -pis)- 

(s)<(r-l+Zfc(s))-d<;,t,fc(s)| 

+ Y ^ a -pis) ■ (t] ■ cx(s)) 

{s: (r-l+LK-i(s)) ■d<bt,K(s)< (r-l+Lmax(s)) -d} 

((r - 2 + Zmax(s)) ■ + 77, ^ 



Y Oi-p{s) 

|s: 6t,J<:(s)> (r-l+Zmax(s)) -d} 



h ■ 1] + a ■ SH 



VtJ{r-2 + Lr^Us))-d,S 
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K-l 
fc=0 



{s: bt,o(s)<(r-l)-d] 

[s: (r-l+Zfe_i(s)) ■d<bt,kis)< (r-l+Lfe(s)) d] 
|s: 6t,fe+l(s)< (r-l+Lfc(s)) ■a!<6t^fc(s)| 

+ X] - a ■p(s) ■ (^r] ■ cxis) 

[s: (r-l+LA'-i{s)) ■d<fet,A'(s)< (r-l+Lmax(s)) -a!} 



s: co(s)>7t,r 



— a ■ rj ■ < 



K-l 

+ E 

fe=0 



■^^ 7t,r+Lfe(s)<Cfe(s)<V+ifc_l(») 



+ E ■ %,r+L,{s) 

+ E ■ Ci^(s) 

+ E Pis) ■ \r+L^^^{s) 



= - ?7 • (7t+l,r+l + h) , 

where the inequahty follows from the induction hypothesis, and the penultimate equality follows 
from the definition 6„ ^ := j ■ d, if 7n,j+i < Ck{s) < This concludes the induction step, and 
the proof of Lemma |2} □ 

Lemma 3. If tt is optimal for periods m — 1, m — 2, . . . , 1, then 

a ■ E [Vi.i ((r - 1) ■ d - r^, S) - V^i [{r - I) ■ d, S)] > ■ [\r + h) , (65) 



February 16, 2010 



DRAFT 



64 



for all {I, r, r/) G ^2 := {(/, r, r]) G W X W X [0, : 2 < / < m, 2 < r < /}. 

Proof. We proceed by induction on /. 

Base Case : / = 2 

/ = 2 implies r = 2, so we have: 

a ■ IE [Vi_^ ((r - 1) ■ - r^, S) - V,.^ ((r -\)-d,S)\ 

= a- E[Vi{d-7],S) -Vi{d,S)] 
= a-E[c{r],S)] 
= V ■ (72,2 + h) , 

where the last equality follows from 72,2 = —h + a ■ ]E[cq{S)], and the fact that i] < zo{s) for 



every s G 5. So (65) holds with equality for / = 2. 
Induction Step 



Assume (65) is true for / = 2,3, ... ,t and all r and rj such that {l,r,ri) G Z2. We show (65) 
is true for / = t + 1 by letting r and r] be arbitrary such that (t + l,r,r]) G Z2. Note that 
(t + 1, r, 77) G ^2 implies t < m — 1, so tt is optimal at time t, and we have: 
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a ■ lE[Vt{{r -l)-d-r],S)- Vt{{r - 1) • d,S)] 



J2 « • -[-V-h + a-E [Vt.i {{r-2)-d- rj, S) - Vt., ((r - 2) ■ d, S)] 

{s: 6t,o(s)<(r-2).(i} 

a ■ p{s) ■ (r) ■ Ckis)^ 

{ s: (r-2+Lfc_i (s)) ■d<bt,k{s)< (r-2+Lfc(s)) d} 

^ Vt-i({r-2 + Lk{s))-d-r],S 



K-l 
fe=0 



-r]-h + a- IE 



_ -V^_i((r-2 + Lfe(s))-ci,5) 



+ J2 Ofp{s)- 

|s: 6t,fc+i(s)< (r-2+Lfc(s)) •d<6t,fc(s)| 
{s:(r— 2+LK-i(s))-ci<6t,K(s)<(r-2+Lmax(s))-d} 

^ ( (r - 2 + L^Us)) ■d-r],S 



{s- 6t,K(s)>(»--2+Lmax(s))-ci} 



-Tj-h + a- IE 



Vt-ii (r - 2 + L^^(s 



)) • '5) 



fe=0 



{s: 6t,o(s)<(r-2)-d} 

{s: (r-2+Lfc_i(s)) ■d<bt,k{s}< (r-2+Lfc(s)) -d} 

+ X ■ Pi^) ■ ■ %,r-l+L,(s) 

|s: 6t,fc+i(s)< (r-2+Lfc(s)) •d<6t,fc(s)| 

{s:(r-2+LK-i(s))-d<6t,K(s)<(r-2+Lmax(s))-d} 
{s: 6t,K{s)>(r-2+Lmax(s))-d} 



^7 • %,r 



February 16, 2010 



DRAFT 



66 



E • lt,r-l 

s: co(s)>7t,r-i 

f 

E p(s)-Cfc(s) 



a ■ 1] ■ < 



K-l 

+ E <! 

A:=0 



+ E pis) ■ lt,r~l+U{s) 



+ E v{s)-ck{s) 

7t,r-l + i,max(s)-'^-ff(*)<'^t,'--l + -£x-l('') 
+ E P{S) ■ 7t,r-l + Ln,ax(s) 



^? ■ (7i+l,r- + h) , 

where the inequality follows from the induction hypothesis, and the penultimate equality again 
follows from the definition of 6„,a,.(s). This concludes the induction step, and the proof of Lemma 

m □ 

We now return to the proof of Theorem |2j We first show by induction that V^{x,s) = 
Vn{x,s),^n G {1,2, ...,A^}, Vs G S, and Vx G {0,d,2d,3d, . . .}. 

Base Case : n = 1 

With one slot remaining, we have: 



Vi{x,s) = min {c{zi, s) + h{x + Zi — d)} 

|max(0,a!-x)<zi<5max(s)} 

= c^max{0, (i — x}, + /i^maxjO, (x — , 

where the minimum is achieved by zi = max{0, d—x}. 71,1 = 00 and 71,2 = 0, so &i,fc(s) = d for 
all s G 5 and G {0, 1, . . . , K}. Thus, according to ( [T7| ), Zi{x, s) is also equal to max{0, d — x}, 
the optimal amount. 

Induction Step 



Assume that for n = {1, 2, . . . , m - 1}, V^{x, s) = Vn{x, s), Wx G {0,d,2d,3d, . . .} , Ws eS. 



February 16, 2010 



DRAFT 



67 



We show this is also true for n = m by considering first any strategy that transmits more than tt 
at time m, and then any strategy that transmits less than tt at time m. Let s G 5 be arbitrary, with 
7mj„+i < Cfc(s) < 7m,ife SO that 77 prescribes = jk ■ d {or k e {0,1, K}. Let 7r« be a 

strategy that at time m transmits enough to satisfy the demands of slots m,m — l,m—2,...,q+l, 



and q, and transmits optimally at times m — l,m 



2,...,L 



Part L Do not transmit more than suggested by tt at time m 

Let 7r'(e) be a feasible strategy with z'^ = Zm + e, where e > 0, and the optimal transmission 
policy at times m — l,m — 2,...,1. We consider four cases for the current buffer level x. 

Case (a) : jk- d- Zk-i{s) <x< jk-i ■ d - Zk-i{s), k e {0,1, . . . ,K] 

In this case, Zm = Zk-i{s). Let p be the integer such that x + Zk-i{s) = p ■ d. Let q, rj be such 
that z'^ = (s) + e = q-d + rj — x and < r] < d (i.e., q = ^^^^ and r] = z'^ + x — q - d^. 
Thus, we have q > p > jk- 



Then we have: 



x,s - 



Vmi^^s) = c(^z'^,s^ - c(^z'^-ri,s 
+7] ■ h 



(66) 



+a ■ E [Kn_i ((g - 1) . d + r/, S) - Kn_i ((g - 1) ■ rf, S) 

> c(^z'^,s^ - c(^z'^~ri,s'^ 

> c(z'^,s) -c{z'^-r],s' 
-V ■ 7mjfe+i (67) 

> 1] ■ (Cfc(s) - 7m,ife+l) (68) 

> 0. (69) 



Equation ( [661 ) follows from Lemma |2| with I = m, r = q, and rj = rj. Equation ( [67] ) follows from 



g + 1 > jfe + 1, which implies 7m,g+i < 7mjfc+i- Equation ( [68] ) follows from z'^ — rj > Zk-i{s 
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and the construction of c(-, s). Finally, ([69]) follows from Cfc(s) > 7mjfe+i: by construction of j^, 
and we conclude: 



VZ'^^\x,s)>V::\x,s) . (70) 



Now let t G {q + 1, q + 2, . . . ,m — p,m — p + 1} be arbitrary. We have: 

VJ^'~\x,s) -V^'{x,s) = c(^{m-t + 2)-d-x,s^-c(^{m-t + l)-d-x,s^ 

+d ■ h 

+a ■ E \Vra-i {(m-t + l)- d, S) - Vrn~i {{m -t)-d, S) 

> c^(m — t + 2) ■ d — x,s^ — c{^m — t + 1) ■ d — x,s^ 

-d ■ 'Jm,m-t+2 (VI) 

> c^(m — t + 2) ■ d — x,s^ — c^(m — t + 1) ■ d — x,s^ 

-d ■ 7m,ifc+i (72) 

> ■ (Cfc(s) - 7mjfc+l) (73) 

> 0. (74) 



Equation fTT] ) follows from Lemma |2| with l = m, r = m — t + l < m — q < m = I, and r] = d. 



Equation (72) follows from: 



t <m-p + l ^ p+l<m-t + 2 ^ jk + 1 <m-t + 2 7m,jfc+i > lm,m-t+2 



Equation ( [73] ) follows from the construction of c(-, s) and the fact that: 



(m — t + 1) ■ d — X > 



m — {m — p + 1) + 1 



d — x = p- d — x = Zk-i{s) . 



Finally, ^74\ follows once again from Cfc(s) > 7mjfe+i5 by construction of j^. Rearranging (74) 
yields: 



C'" {x,s) > V:^\x,s), \/tE{q + l,q + 2,...,m-p,m-p+l}. 



(75) 
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Noting that V^{x, s) = (a;, s), (|70|) and repeated application of (|75[) imply: 



and we conclude tt is at least as good as 7r'(e). 

Case (b): jk-d- <x <jk-d- Zk^i{s), k e {0,1, . . . , K - 1} 



Let q, 7] be such that z'^ = {m — q + 1) ■ d + rj — x and < rj < d (^i.e., g = m + 1 — ^'"^ ^ and 
V = -2^ — ("^ — q'+l)-c?— . Note that m — q+1 > jk by the assumption that z'^>Zm= jk-d—x. 
Additionally, because x < jk ■ d — Zk^i{s) and m — g + 1 > j^, we have: 

{m-q + l)-d-x> {m - q + 1 - jk) ■ d + > Zk^i{s) , 



which implies: 



c^(m — q + 1) ■ d + 1] — X, — c^(m — g + 1) ■ c? — a;, > t] ■ Ck{s) . 



(76) 



Then we have: 



^^\x, s) — V^'' {x, s) = c^(m — q + 1) ■ d + 7] — X, — c^(m — g + 1) • — x, 

■ h 

+a ■ E Vra-i ((m - q) ■ d + r],S) - ((m - q) ■ d, S) 

> c^(m — g + 1) ■ (i + 77 — X, — c^(m — g + 1) ■ — x, 

-?7 ■ 7m,m-g+2 (77) 

> c^(m — q + 1) ■ d + r] — X, — c^(m — g + 1) • c? — x, 

■ 7m,jfe+l (78) 

> ^ ■ {Ck{s) - 7mjfe + l) (79) 

> 0. (80) 



Equation fTT] ) follows from Lemma |2} with I = m, r = m — q < m — 1, and rj = rj. Equation 
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f78| ) follows from m — q + 2 >jfc + l, which implies 7m,m-g+2 < Trnj^+i- Equation (|79]) follows 



from ( |76l ). Finally, ([80]) follows from ca;(s) > 7m,jfc+i, by construction of jk, and we conclude: 



v:;^^\x,s)>vz\x,s) . (81) 



Now let t G {g + 1, g + 2, . . . , m — jfc, m — jfc + 1} be arbitrary. We have: 

V^'~\x,s) -VZ\x,s) = c(^{m-t + 2)-d-x,s^-c(^{m-t + l)-d-x,s^ 

+d ■ h 

+a ■ E \Vra-i {(m-t + l)- d, S) - Vrn~i {{m -t)-d, S) 

> c^(m — t + 2) ■ d — x,s^ — c{^m — t + 1) ■ d — x,s^ 

— d ■ 7m,m-t+2 (82) 

> c^(m — t + 2) ■ d — x,s^ — c^(m — t + 1) ■ d — x,s^ 

-d ■ 7m,ifc+i (83) 

> rf- (5fc(s) -7m,i,+i) (84) 

> 0. (85) 



Equation ( [82] ) follows from Lemma [2] with l = m, r = m — t + l < m — q < m = I, and r] = d. 



Equation (83) follows from: 



t <m- jk + l ^ jk + l<m-t + 2 lm,j„+l > lm,m-t+2 



Similarly to ( f76] ), equation ( 184] ) follows from the fact that 



m — t + l)-d — x> m — {m — jk + I) + I -d — x = jk-d — x> Zk-i{s) 



Finally, ( [85] ) follows once again from Cfc(s) > 7mjfe+i5 by construction of j^. Rearranging (85) 
yields: 



C*' ix,s) > VZ\x,s), yte{q + l,q + 2,...,m-jk,m-jk + l}. 



(86) 
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Noting that V^{x, s) = ^'''^^ [x, s), (|8lj) and repeated application of (|86|) imply: 



and we conclude tt is at least as good as 7r'(e). 

Case (c) : jx-d- z^^^{s) < x < Jk ■ d- zx-iis) 
Same as Case (b) with K replacing k. 
Case (d) : < x < jk ■ d - 5max(s) 

Zm{x, s) = z^a^s), the upper bound of the action space, so it is not feasible to transmit more. 

Part II : Do not transmit less than suggested by tt at time m 

Let 7r"(e) be a feasible strategy with z'^ = Zm — e, where e > 0, and the optimal transmission 
policy at times m — l,m — 2, . . . , 1. To satisfy feasibility, we require Zm — ^ > max(0, d — x). 
Define V ^ ^ [^\ ' d, and note that r] E [0, d). Let ttq be a strategy that at time m satisfies 
the demands of periods m, m — 1, . . . , /, except for 9 units of the demand of period /, where 
< 9 < d, and behaves optimally in slots m — l,m — 2,...,1. We consider four exhaustive 
cases for the current buffer level x. 

Case (a): X > jo ■ d 

Zm{x, s) = 0, the lower bound of the action space, so it is not feasible to transmit less. 

Case (b): jk - d — Zk{s) < x < jk ■ d — Zk-i{s), A; G {0, 1, ... , K}, where we define zk{s) : = 

^max('5) 

Define q := m — j^ + I + [^J . By the feasibility of 7r"(e) and e > 0, we have 
q E {m — jk + l,m — jk + 2,...,m — 2,m — 1}. Furthermore, we have: 



[m — q + 1] ■ d — X 



m-[m-j, + l+[^\)+l].d-x<j,.d-x<Zk{s 



which, by the construction of c(-, s), implies: 

c^(m — q + 1) ■ d — 1] — X, — c^(m — q + 1) ■ d — x,s^ > —rj ■ Ck{s) . (87) 
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We now compare tt^ and ttq: 

V^^{x, s) — V^°{x, s) = c^(m — q + 1) ■ d — T] — X, — c^(m — q + 1) ■ d — x,s 

—h ■ 7] 

+a ■ E Vm-i{{m - q) ■ d-r],S) - Vm-i{{m - q) ■ d, S) 

> c^(m — q + 1) ■ d — T] — X, — c^(m — q + 1) ■ d — x,s^ 

+ri ■ 7m,m-g+l (88) 

> c^(m — q + 1) ■ d — T] — X, — c^(m — q + 1) ■ d — x,s^ 

+V ■ 7m,i, (89) 



> rj ■ 

> 0. 



7mjfc - Ck[S) 



(90) 
(91) 



Equation ( [88] ) follows from Lemma [3] with r = m — q + l<m = l and t] = rj. Equation 
follows from: 

q>m-jk + l ^ m-q + l<jk ^ Jmjk 



Equation ( |90l ) follows from ([87]). Finally, ( [91] ) follows from Cfc(s) < 7m,jfc- Rearranging (91) 
yields: 



V:,^{x,s)<VZ-{x,s) 



(92) 
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Next, let t E {m — + l,m — jk + 2, . . . ,m — 1} he arbitrary. We have: 

Vm'^\^,^) -^mH^^s) = c(^{m-t) ■d-x,s^ -c(^{m-t+l) ■d-x,s 

—h ■ d 

+a ■ E Kn-i {(m-t-l)- d, S) - Vm-i{{m - t) ■ d, S) 

> c(^{m — t)-d — x,s^ — c^(m — t + 1) ■ d — x,s^ 

+d ■ Jrn,m-t+l (93) 

> c^(m — t)-d — x,s^ — c^(m — t + 1) ■ d — x,s^ 

+d ■ Imj, (94) 



> d- 

> 0. 



(95) 
(96) 



Equation ( [93] ) follows from Lemma |3] with r = m — t < m = I and r] = d. Equation ( [94] ) follows 
from: 



t>m-jk + l ^ m-t + 1 < jk 7m,,fc < 1, 



jk — im-t+l 



Equation ( [95] ) follows from construction of c(-,s) and the fact that: 



{m — t + 1) ■ d — X < {m — (m — jfc + 1) + 1 j - d — x=jk-d — x< Zk{s) . 



Finally, ( |96l ) follows from Ck{s) < 7m,jfc- Rearranging (96) yields 



V<{x,s) <V;^'^ {x,s) Vt e {m-jfc + l,m-jfc + 2,...,m-l} . 



(97) 



Noting that tt = tt™ (|92[) and repeated application of (|97j) imply: 



. < V<{x,s) < VZHx,s) = VZ"^'\x,s) , (98) 
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and we conclude tt is at least as good as 7r"(e). 

Case (c): jk-d- h-iis) <x < jk-i ■ d - Zk-i{s), k e {I,. . . ,K} 



In this case, tt = tt 



m+l— 




■= + ^fc-l(3) 



. Define p := m + 1 



and q :-- 



Again, we start by comparing tt'L and ttq: 



—h ■ 1] 



+a ■ E Vm-i{im - q) ■ d-ri,S) - Vm-i {{m - q) ■ d, S) 

+V ■ 7m,m-<7+l (99) 
+V ■ ImJ,., (100) 



> 7] ■ 

> 0. 



7mj,_i -Ck-l[S) 



(101) 
(102) 



Equation ( |99l ) follows from Lemma [3] with r = m — q + l<m = l and r] = rj. Equation ( |100| ) 
follows from: 



m — g + l = m— + 



+ 1 



Z + Zk-l(s) 

d 



d ' 



which implies '^mj^-i — lm,m-q+i- Equation ( |101[ ) follows from < Zk-i{s) and the construc- 



tion of c(-,s). Finally, ( |102[ ) follows from Cfc_i(s) < 7mjfc_i- Rearranging (102) yields 



v;:o(a;,s)< k:?(x,s) . 



(103) 



February 16, 2010 



DRAFT 



75 



Next, let t G {p,p + 1, . . . ,q — 1} be arbitrary. We have: 

Vm'^\^,^) -^mH^^s) = c(^{m-i) ■d-x,s^ -c(^{m-i+l) ■d-x,s 

—h ■ d 

+a ■ E Kn-i {{m -i-l)-d,S) - Vm-i{{m - i) ■ d, S) 

> c^(m — i)-d — x,s^ — c^(m — i + 1) ■ d — x,s^ 

+d ■ lm,m-t+l (104) 

> c^(m — i)-d — x,s^ — c^(m — i + 1) ■ d — x,s ^ 
+d ■ 7m,jfc_i (105) 



> d- 

> 0. 



(106) 
(107) 



Equation ( |104[ ) follows from Lemma [3j with r = m — t<m = l and t] = d. Equation ( |105| ) 
follows from: 



X + 2;fc_i(s) 

t>p =^ m — t + l<m — p + l = ; < jk-i 

d 



1m,jk-i — 1m-i+l 



Equation ( |106| ) follows from construction of c(-, s) and the fact that: 



{m — i + 1) ■ d — X < {m — p + 1) ■ d — X = Zk-i (s) 



Finally, ( |107[ ) follows from Cfc-i(s) < 7m,jfc_i- Rearranging ( |107| ) yields 



V<{x,s)<VZ^ (x,s) Vt G {p,p + l,...,g-l} 



(108) 



Then ( |103[ ) and repeated application of ( |108| ) yield 



= v:^^^{x,s) 



■ ■ < V<~^ (x, s) < V< {x,s)< VZ^ (x, s) = (x, 



Febniary 16, 2010 



DRAFT 



76 

and we conclude tt is at least as good as 7r"(e). 
Case (d): < x < jx ■ d — Zmax 

The same argument as Case (c) applies with k replaced by + 1 and zk{s) = z^ax{s). This 
completes Part 11. 

From Parts I and II, we conclude tt is optimal if the starting queue level is an integer multiple 
of the demand d. By assumption, the starting queue level x at time is zero. Thus, tt is 
optimal at time A^. z]^{x, s) = zn{x, s) will also be an integer multiple of demand as bj\f^kis), 
and {zk{s)}f^^Qi ^ are all integer multiples of d. It follows that the queue level at the end 
of slot N (equal to the queue level at the beginning of slot N — 1), z*j^{x^s) — d, will also 
be an integer multiple of d. Continuing this logic, if the strategy tt is used, the queue level at 
the beginning of each subsequent time slot will be an integer multiple of demand. Thus, tt is 
optimal. □ 

D. Proof of Theorem [7| 

We prove statements (i)-(v) by joint induction on the time remaining, n. 
Base Case : n = \ 

V^)(x, So) = 0, for all Sq, so (i) and (ii) hold trivially. Let Si G 5 be arbitrary. G'i(yi,Si) = 
Cg^y^ + h{y^ — A), which is convex and supermodular. Thus, (iii) and (iv) are true. Additionally, 

Gi(yi,Si) = ■y? + h"' {yY" - d"")}, so inf \ argmin \Gi {yl,yl, s\, sjU \ is indepen- 

m=l [yle[d2,oo) I J J 

dent of yl, and vice versa. Thus, (v) is true for n = 1, completing the base case. 
Induction Step 

Assume statements (i)-(v) are true for n = 2, 3, — 1. We want to show they are true for 
n = I. We let s G 5 be arbitrary, and proceed in order. 

(i) Consider two arbitrary points, x, x G 1R\. Let A G [0,1] be arbitrary, and define x : = 
Ax + (1 — A)x. Let y*(x, s), y*(x, s), and y*(x, s) be optimal buffer levels after transmission 
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in slot / — 1, for each of the respective starting points. We have: 



A-V^-i(x,s) 



(l-A)-l^_i(x,s) = -c^x + A-G,_i(y*(x,s),s) 

+ (1-A)-G,_i(y*(x,s),s) 
> -c^x + G,_i(Ay*(x,s) + (1 - A)y*(x,s),s) 



> -c'^x+ min {G,_i(y,s)} 

= Vl_i(x,s) = V5_i(Ax+(l-A)x,s) 



(109) 



where the first inequality follows from the convexity of Gi-i{-,s) from the induction 
hypothesis. The second inequality follows from the following argument. y*(x, s) G A^{x,s) 
implies: 



y*(x,s)^dVx and c"^ [y* (x, s) - x] < P 



(110) 



Similarly, y*(x, s) G ^"(x, s) implies 



y*(x,s)^dVx and c"^ [y* (x, s) - x] < P . 



(Ill) 



Multiplying the equations in ( 1 10) by A and the equations in ( |1 1 1[ ) by 1 — A, and summing, 
we have: 



Ay*(x,s) + (1 - A)y*(x,s) ^ A(d V x) + (1 - A)(d V x) ^ d V x, 



(112) 



and 



cnAy*(x,s) + (l-A)y*(x,s)-x] 
= A< [y*(x,s) - x] + (1 - A)cl [y*(i,s) - < P . 



(113) 



From (112) and (113), we conclude Ay*(x,s) + (1 — A)y*(x,s) G ^ (x,s), as shown in 



Figure [S] Thus, the value of Gi-i{-,s) at this point is greater than or equal to the minimum 



of Gi{-,s) over the region ^''(x,s). From (109), we conclude VJ_i(-,s) is convex. This is 



February 16, 2010 



DRAFT 



78 



a similar argument to the one used by Evans to show convexity in [60]. 



Buffer Level 
of Queue 2 

Before 
Transmission 



X 



o =Ay ix,s) + il-A)y (x,s) 




Buffer Level of Queue 1 Before Transmission 
Fig. 8. Diagram showing Ay*(x,s) + (1 — A)y*(x,s) G .4''(x,s) in the proof of the convexity of Vi-i{-,s). 

(ii) Recall that V;_i(x,s) = — c^x + minyg_^d(xs) {G'«-i(y)S)}. The first term, — c^x, is clearly 
supermodular in x, so it suffices to show that the second term, minyg_4d(x,s) {C^-ily? s)}, is 
also supermodular in x. Let x,x G iR^ be arbitrary. We want to show: 



min {G'i_i(y,s)} + min {G'i_i(y,s)} 
< min {Gi_i(y,s)}+ min {Gi_i(y,s)}. 

yG.4''(xAx,s) ye^''(xVx,s) 



(114) 



If X and X are comparable (i.e., > and > or x^ < x^ and < x^), then (114) is 
trivial. So we assume they are not comparable, and also assume without loss of generality 
that x^ < x^ and x^ < x^. We begin with a quick lemma. 

Lemma 4. There exist optimal buffer levels after transmission in slot I — 1, y*{x Ax,s) and 
y*{x\/x,s), such that y* {x A X , s) y*{x W x,s); i.e., such that y*^{xAx,s) <y*^{x\/x,s) 
or y*^ (jc Ax,s) < y*^ {x V jc, s). 

Proof. Fix a choice of y*(xVx,s) such that G'i_i(y* (x V x,s) ,s) = min {G'i_i(y, s)}. 
Assume that for all optimal choices of y*(x A x, s), we have y*(x A x, s) >- y*(x V x, s). Fix 
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one such choice of y*(x A x, s), and we have: 



y*(x Ax,s) y y*(x Vx,s) ^ d V (x V x) . 



(115) 



Further, y*(x A x, s) G ^''(x A x, s) implies [y*(x A x, s) - x A x] < P, and thus: 



c"^ [y*(x A X, s) - X V x] < [y*(x A X, s) - X A x] < P . 



(116) 



Equations ( 1 15 1 and (116) imply y*(x A x, s) G ^''(x V x, s), and thus: 



G;_i(y* (xVx,s),s) = min {G'i_i(y, s)} < (y* (x A x, s) , s) 

ye^''(xVx,s) 

However, we also have: 



(117) 



y*(x V X, s) ^ d V (x V x) ^ d V (x A x) 



(118) 



and 



c"^ [y*(x V X, s) - X A x] < cny*(x A X, s) - X A x] < P . 



(119) 



Equations (118) and (119) imply y*(x V x, s) G A'^{x A x, s), which, in combination with 



(117), implies it is optimal to move from xAx to y*(xVx, s), contradicting the assumption 
that y*(x A X, s) >~ y*(x V x, s) for all possible choices of y*(x A x, s). □ 

Now let y* (x A x, s) and y* (x V x, s) be arbitrary optimal actions such that y* (x A x, s) ^ 



y*(x V X, s). We show (114) by considering two exhaustive cases. 



Case 1 : y*(x V x, s) >z y*(x A x, s) 
We start with another lemma. 

Lemma 5. Let f : [d^, oo) x [(P, oo) M be convex and supermodular, let a, /3 G [0, 1] be 
arbitrary, and let z = (-Zi, Z2) ^ (-Zi, Z2) = t Define z^^''^'^ := ^Ai^i + (1 — \i)zi, A2-22 + 
(1 - \2)z'2^. Then 



f{z) + f{z)>f{z''^^) + f{z^'^''-') 



(120) 
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Proof. 

Step 1: Assume a, /3 < ^. Assume without loss of generality that cr < /3. By the convexity 



of /(•), we have: 



/(z) + /(z) > fiiT^'^) + /(z 



l-cr,l-(T\ 



and 



By the supermodularity of /(■), we have: 



/(z^--'0 + /(z'^''^)>/(z^'0 + /(z 



l — a.(T\ 



Figure [9] shows these relationships. Combining (121)-(123l, we have: 



/(z) + /(z) > /(z'^''^) + /(z^-'^'i-'^) 



(121) 



(122) 



(123) 



Step 2: Now let ct, /3 G [0, 1], and define a := min{cr, 1 — a} and /3 := min{/3, 1 — /?}. 



Then a, j3 < \, so by Step 1, we have: 



/(z) + /(z) > /(z^'^) + /(z 



(124) 



Note that tF'^ A z^"'"'^-'^ = z'^'^, and z'^''^ V z^"'^'^"'^ = z^"'^'^"'^, so by the supermodularity 
of /(■), we have: 



/(z-'^) + /(zi-^'i-^)>/(z'^'^) + /(z 



.l-o-,l- 



Combining ( |124| ) and ( |125p yields the desire result, ( |120[ ) 



(125) 



□ 
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-2 




a-z} +{l-a)-z 



z 
^1 



(l-f7)-|' + a-z 



Fig. 9. Diagram of the points referred to in Step 1 of tlie proof of Lemma |5] 



Next, define the following points, shown in Figure 10 



+ max (x A X, s) — x^,y*^ (x V x, s) — x^^ 
x^ + min 1?/*^ (x A x, s) — x^, y*^ (x V x, s) — x^j 



x^ + min 1?/*^ (x A x, s) — x^, y*^ (x V x, s) — x^j 
+ max < y*^ (x A x, s) — x^, y*^ (x V x, s) — x^ 



y := 

Note that y ^ d V x and y ^ d V x. Furthermore, we have 
cl(y-x) = c; 



max <! y*^ (x A x, s) — x\ y*^ (x V x, s) — x^ !> , 



< max 



min 1?/*^ (x A X, s) — x^, y*^ (x V x, s) — x^j 
( y*^ (x A X, s) - x^,y*^ (x A X, s) - x^ 



cl [y*^ (x V X, s) - x^ y*^ (x V X, s) - x^ 
= maxjc^ [y* (x A x, s) - (x A x) j , c"^ [y* (x V x, s) - (x V x) j } < ^• 

By a similar argument, (y — x) < P, and thus y ^ (x, s), and y G (x, s). So we 
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have: 



min {G/_i(y,s)}+ min {Gi_i(y, s)} < Gi_i(y, s) 

ye,4''(x,s) ye^''(x,s) 



G,_i(y,s). (126) 



Buffer Level 
of Queue 2 



xvx 



JC A jc 
1 



X 



Buffer Level of Queue 1 



Fig. 10. Construction of feasible points y and y in Case 1 of the proof of supermodularity of Vi_i(-,s) 



Now 



a := 

y* (xVx,s)— J/* (xAx,s) 



3 ■= y* (xVx,s)-»/^ 
' y*'^ (xVx,s)— y*"^ (xAx,s) 



Rearranging the definitions of a and /3 yields: 



y = ((1 - a) ■ y*" (x V s) + a ■ y*' (x A s), (1 - /3) ■ y*' (x V x, s) + /3 ■ i/*'(x A i, s) 



It is also straightforward to check that: 



y = (a • y*' (x V X, s) + (1 - a) ■ y*' (x A s) , /3 ■ y*' (x V x, s) + (1 - /3) ■ y*" (x A i, s) 



'^If J/* (x V X, s) — y* (x A X, s) = 0, let a be arbitrary in [0, 1]. Similarly, if y* (x V x, s) — y* (x A x, s) = 0, let /? be 
arbitrary in [0, 1]. 
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Note also that 



*'(xAx,s) = mill |?/*'(x A x,s),|/*'(x A x,s) + (x^ - 
< min |?/*'(x V x,s),y*\x A x,s) + (x^ - 



and thus, a G [0, 1]. Similarly, y*^{x Ax,s) < y"^ < y*^{x V x, s), and thus, /3 G [0, 1]. 
Since G'/_i(-,s) is convex and supermodular, we can now apply Lemma |5} with y*(xAx, s) 
playing the role of z; y*(xVx, s) the role of z; y the role of z'^'^; and y the role of z^^'^'^^'^, 
to get: 

Gn(y,s) + G,_i(y,s) < G^^i (^y*(x A s), s) + (y*(x V x, s), s) 

min {G'i_i(y,s)}+ min {G'i_i(y, s)} . (127) 

ye^''(xAx,s) y6^''(xVx,s) 



Combining equations ( |126 ) and ( |127| ) yields the desired result, (114). 
Case 2 : y*(x Vx,s) ^ y*(x Ax,s) ^ y*(x Vx,s) 

There are two possibilities for this case. The first possibility is that y*^ (xAx, s) > y*^ (xVx, s) 
and y*^ (x A x, s) < y*^ (x V x, s). The second possibility is that y*^ (x A x, s) < y*^ (x V x, s) 



and y* (x Ax, s) > y* (xVx, s). We show (114) under the first possibility, and a symmetric 



argument can be used to show (114) under the second possibility. We have: 



y*\x A X, s) > y*\x V x, s) > max {(x V x)\ d^] = max {x\ d^] 



(128) 



y*\xAx,s) > max{(x Ax)^,(i^} = max{a;^,(i^} 



(129) 



and 



y*(x Ax,s) -X 



y (x Ax, s) - (x Ax) 



< P . 



(130) 
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Equations (128), (129), and (130) imply y*(x A x,s) G ^''(x,s). If it also happens that 
y*(x V X, s) G (x, s), then we have: 



min {G'i_i(y,s)} + min {Gz_i(y,s)} 

ye^''(x,s) yeJ^''(x,s) 

< G'z_i(y*(xAx,s),s) +G,_i(y*(xVi,s),s) 

= min {G,_i(y,s)}+ min {G',_i(y, s)} . 

ye^''{xAx,s) yev4''{xVx,s) 



Otherwise, define: 



7 



<[y*(xVx,s)-x] -P 



[y* (x V X, s) — y* (x A x, s) 
From y*(x V x, s) ^ A''^ (x, s) and y*(x A x, s) G A*^ (x A x, s), we know: 

c^y*(x V X, s) > c'^x + P > c^(x A x) + P > c^y*(x A x, s) . 



(131) 



It is clear from (131 ) that the numerator and denominator of 7 are positive, and 7 G [0, 1]. 
Now define: 

y := 7y*(x Ax,s) + (1 -7)y*(x Vx,s) , and 
y := (1 -7)y*(x Ax,s) +7y*(x Vx,s) . 

It is somewhat tedious but straightforward to show that y G (x, s), and y G (x, s). 
Thus, we have: 



min {Gz_i(y,s)}+ min {Gi_i(y, s)} < Gi_i(y, s) + G,_i(y, s) . (132) 
ye^''(x,s) ye^'*(x,s) 



In Figure 11 y is the point where the line segment connecting y*(x A x, s) and y*(x V x, s) 
intersects the budget constraint (hypotenuse) of A^ (x, s), and y is a point along this line 
segment the same distance from y*(x A x, s) as y is from y*(x V x, s). By the convexity of 
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G';__i(-,s) along this line segment, we have: 



Gi_i(y,s) + G,_i(y,s) < (y*(x A x, s), s) + (y*(x V x, s), s 



rnin {G;_i(y,s)}+ min {Gz_i(y,s)}. (133) 

ye.4''(xAx,s) ye.4''(xVx,s) 



Combining ( |132| ) and ( |133| ) yields the desired result, ( |114| ) 



Buffer 
Level of 
Queue 2 



I ~3'*(xvjc,i) 




-1 ~1 



y*(x /\x,s) 



Buffer Level of Queue 1 



Fig. 11. Construction of feasible points y and y in Case 2 of the proof of supermodularity of Vi-i{-,s) 



(iii) G'/(y,s) = c^y + /i(y-d) + a-^[V;_i(y-d,S)]. By (i), for all s, VJ_i(x,s) is convex in x; 
thus, V5_i(y — d, s) is convex in y as it is the composition of a convex function with an affine 
function. E![Vi-i(j — d, S)] is also convex as it is the nonnegative weighted sum/integral 
of convex functions. It follows that Gi{y,s), the sum of convex functions, is convex in y. 

(iv) Supermodularity of ^^(y, s) follows from the same series of arguments as (iii), because, 
like convexity, supermodularity is preserved under addition and scalar multiplication (Smith 
and McCardle refer to these as closed convex cone properties [78]). 

(v) This step basically follows from Topkis' Theorem 2.8.1 [76, pg. 76], but, for the reader's 
benefit, we reproduce the proof here with our notation. Let y^, G [(P,oo) be arbitrary with 
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1/2 < Let G aTgmmyi^[ai^^){Gi{y\y^,s)} and y^ G argmin,^ig[^i ^^jjC; s)} 
be arbitrary. We want to show: 

f e argmin {d {y^,f, s) } . 
j/ie[rfi,oo) 

If < 1/^, this is trivial, so we check that it is true for y^ > y^ ■ Since y^ is a minimizer 
of Gi {■,y'^,s), we have: 



Gi{y\y\s) <Gi{y\y\s) , 



(134) 



and since y^ is a minimizer of (-,^2,8), we have: 



Gi {y\f,s)<Gi {y\f,s) 



(135) 



By the supermodularity of Gi{-,s), we have: 

Gi{y\y\s)+Gi{y\f,s) < Gi {f Ay\y' Af,s) + Gi {y' V y\y^ V f,s) 

= Gi{y\y^s)+Gi{y\f,s) , 

or, rearranging terms: 

Gi {y\y\ s) - Gi {y\y\ s) < Gi {y\ y\ s) - Gi [y\ y\ s) . (136) 



Combining ([134]), ( [T35| ), and ([136]) yields: 



< [y\y\^)-G, [y\y\^) < Gi {y\f,s)-Gi {y\f,s) < . (137) 



So (137) holds with equality throughout, implying Gi{y^,y'^,s) = Gi {y^,y'^,s), and we 
conclude: 



y^Ay^ = y^e argmin {G/ (?/\y^s)}. 

yie[di,oo) 



February 16, 2010 



DRAFT 



87 



Since y and y were chosen arbitrarily, we have: 



inf <^ argmin <^ Gn {yl, yl, s^,s^)\}> inf <^ argmin <^ Gn {yl, yl, s\ s^) 
The first implication in (v) follows from a symmetric argument. □ 



E. Proof of Theorem |5| 

Let 72 G {1,2,..., N} and s G 5 be arbitrary. We start by proving ([24]). First, let x G TZiin, s) 
and y G ^''(x, s) be arbitrary. We know from Theorem [v] that s) is convex on [d^,oo) x 
[(P,oo), which implies that G,„(-,s) is also convex on any line segment in [(f,oo) x [(P^oo) 
(see, e.g., [77, Theorem 4.1]). Specifically, by the convexity of G„(-,s) along the line y^ = y^ 
and the fact that y"^ > > /^(y\s), we have: 

G„(y,s) >G„((y\x2),s) >Gn[{y\fl{y\s)),s) . (138) 

Similarly, by the convexity of G'„(-,s) along the line = and the fact that y^ > > 
/^(x^, s), we have: 

G„((y\a;2),s) > G„(x,s) > Gn{{fl{x\s),x^),s) . (139) 
Combining ([138]) and ( [T39] ) yields: 



Gn(y,s) > G'„((y\x2),s) > G„(x,s) , 

and we conclude G'„(x,s) = minyg^d(x,s) {Gn{y,^)}- 

Second, let x G 7^//(n,s) be arbitrary. Then bn(s) G ^''(x, s) and b„(s) is a global minimizer 
of G„(-,s), so it is clearly optimal to transmit to bring the receivers' buffer levels up to b„(s). 

Next, let X G lZin-A{n,s) and y G ^''(x, s) be arbitrary. By definition of /^(■,s), we have: 

Gn(y,s)>G„((/^(y2,s),y2),s) . (140) 
Furthermore, the function minj,ig[^i |g„ ^(y^, y^), | is convex in since [d^,oo) is a 
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convex set (see, e.g., [75, pp. 101-102]). Thus, y > x > &„(s) implies: 



G„,((/^(y^s),y2),s 

>G„((/^(x2,s),a;2),s 
>G'„((/^(6^(s),s),6^(s)),s 
= G„(^b„(s),s) . 



(141) 



Combining ( |140| ) and ( |141| ) yields: 



G.(y,s)>GJ(/^(x2,s),x2),s 



and X G 7^///_^(n,s) implies (^/^(x^, s), j G ^""(x, s). Since y G ^''(x, s) was arbitrary, we 
conclude y*(x, s) = ^/^(x^, s), x^j is optimal. 

The optimality of y*(x, s) = ^x^, /^(x^, s) j for x G lZui-B{n.,s) follows from a symmetric 
argument, using the convexity of (?„(■, s) along the curve ( x^, /^(x^, s) ) . 



Finally, we prove (25). Define: 



H'*(x,s) := |y G [d\oo) X [d^oo) : y ^ x and c'^ [y - x] = p| C ^''(x, s) 



First, let x G TZiv-sin, s) and y G ^''(x, s) be arbitrary such that [y — x] < P. Define 

^ clb„(s) - c^x - P 
° ■ ^^^(s) - c-^y 

Note that [y - x] < P and [b„(s) - x] > P imply Aq G (0, 1). Then define: 

y := Aoy + (1 - Ao)b„,(s) . 
By the convexity of G'„(-,s) along the line segment from y to b„(s), we have: 



G'n(y,s) > G'„(y,s) > G'„(^b„(s),s 
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Since y e ^'*(x, s) was arbitrary, we conclude: 



min |G„(y,s)|= min |G„(y,s)|. 



Next, let X e 7liv-c{n,s) and y e ^''(x, s) be arbitrary such that [y — x] < P. We consider 
two exhaustive cases, and for each case, we construct aye 'H*(x, s) such that Gn (y, s) < 
Gn (y,s). 

Case 1 : f < fl {y\s) and y := {y\ (y\s)) ^ ^''(x,s) 

Let y := (^y^, 2;2_|_ :^zSiJlji:!ii j Then, by the convexity of Gn{-, s) along — y^, the definition 

of /2 {y\s), and f <f < fl iy\s), we have: 

G.(y,s) = G,((^\/^(^\s)),s) <G„(y,s)<G„(y,s) . 

It is also straightforward to check that y e 'H^(x, s), as desired. 
Case 2 : All other y G ^''(x, s) such that [y - x] < P 
By the definition of (y^, s), we have: 

G'„(y,s)>G„((yS/„2(y\s)),s) . (142) 



Define: 



f :^sup[y'e[x\y'):e,{y\f^{y\s))>clx + p}, and 



C.,2 



By the convexity of G„(-,s) along [y^^f^ we have: 

Gn((y\/„'(y\s)),s) > G„((y\/„2(y\s)),s) . (143) 
Furthermore, we have: 

Gn((y\/n'(ySs)),s) =G„((y\y2),s) =G„(y,s) . (144) 
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If y = /„ , s), ( 144) is trivial. Otherwise, there is a discontinuity in /^(^s) at y , and we 



have: 



(145) 



with at least one of the inequahties being strict. Nonetheless, Gn(^{y^, f^{y^, s)),sj is a contin- 
uous function of y^, and therefore: 

G^(^{f,hmJ^{y\s)),s)=Gn(^{y\hmJ^{^^ (146) 

The convexity of G„(-,s) along the line y^ = y^ and ( |146[ ) imply: 



Gj{y\y'),s) =GJ{y\fn{y\s)),s) , Vy^ e Ihrn f^{y\s), hm f^{y\s) 

\ / \ / y^\y^ y^/-y^ 



which in combination with ( |145[ ) implies ( |144[ ). Combining ( |142| )-( [T44l ) yields the desired result: 
Gn (y,s) <G„ (y,s) for ay gH''(x,s). 

The validity of ([25]) for x G lZiv-A{n, s) follows from a symmetric argument, completing the 
proof of ( [25] ) and Theorem [8| □ 

IX. Appendix B - Infinite Horizon Discounted Expected Cost Proofs 

A. Proof of Theorem [5] 

Our line of analysis is similar in spirit to [53], [79], and [80, Chapter 8]. Let x E and 
s G 5 be arbitrary. First, we show inductively that Vi(x,s) < V2(x,s) < ... < V^(a;,s) < 

Vn+l{x, S) < .... 

Base Case: n = 1 



mm 



{c{z, s) + h{x + z — d)} 



< 



|max(0,d-a::)<2;<2niax{s)} 

{c(z, s) + h{x + Z — d) 
+a ■ ]E\Vi{x + z - d, Si) I 5*2 = s] 

V2(X,S) , 
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where the inequality follows from Vi{x,s) > 0, Vx,Vs. 

Induction Step: Assume Vn{x, s) < Vn+i{x, s) for n = 1, 2, . . . , m — 1. We show it is true for 



n = m: 

{c{z, s) + h{x + z — d) 
+« ■ ]E\y.m-i{x + z — d, Sm-i) \ Sm = s] 

{c{z, s) + h{x + z — d) 
+a ■ IE [Vm{x + z — d, Sm) I Sm+i = s] 

= Vm+l{x,s) , 

where the inequality follows from the induction hypothesis and the homogeneity of the Markov 
process representing the channel condition. So, for every x E JR+ and s E S, {Vn{x, s)}^^^ 2 
is a nondecreasing sequence. 

Next, consider a policy tz'^ transmitting d packets in every slot, regardless of channel condition. 
Define: 



sup {cfc(s)} < 00 . (147) 

ke{0,l,...,K} 



Then we have: 



V„{X, S) < V^\x, S) < (^Cmax " d + j _^ < (^Cmax " d + h{x)^ ^ °° ' 

SO {Vn{x, 2 is a bounded nondecreasing sequence, implying lim„^oo Vn{x, s) exists and 

is finite, Vx G iR+,Vs G S. 



We now move on to part (b). Recall from Section VIII-A that Vn{x, s) is convex in x, for all 
n and all s. Define Voo(x, s) := lim„_j.oo Vn{x, s). Let s E S he arbitrary, but fixed. Voo{x, s) = 
snp^^jj^Vnix, s), so Voo{x,s) is convex in x as it is the pointwise supremum of the convex 
functions {V;(x, s)}„^^ 2 
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Define g^o '■ [d, oo) x 5 — )■ iR+ by 



9oo{y,s) := h{y - d) + a ■ E[Vooiy - d,S') \ S = s] 

= h{y — d) + a ■ IE lim Vniy — d, S') \ S = s 

= h{y - d) + a- \im E[Vn{y-d,S')\S = s] (148) 

n— ^oo 

= lim Qniy^s) , 



where (148) follows from the homogeneity of the Markov process representing the channel 



condition and the Monotone Convergence Theorem. Furthermore, for each s E S, gooiy,s) is 
convex in y and lim gooiy, s) > lim h{y — d) = oo. Thus, for every s, at least one finite number 

y—^ca y—^oo 

achieves the global minimum of (jooiy, s). 

Next, we proceed to part (d), and let s G 5 be arbitrary. Define 6oo,-i(s) := oo and 

6oo,fc(s) := max|d,inf{6 | g^{b,s) > -Cfe(s)}| , VA; G {0,1,. ..,K} . 

Clearly, 6oo,-i(s) = lim bn-i{s), as 6„_i(s) := oo for every n. Let k E {0,1, . . . , K} be 
arbitrary. We want to show: 

lim br,,k(s) 



71— ^-OO 



lim max<d,mi{b I g'^{b,s) > —Ck{s)\ 
= maxjd, inf{6 | g^ib,s) > -Cfc(s)}| := 6oo,fc(s) . 

By the continuity of max{d, ■}, it suffices to show: 

Jim |inf{6 I ~g'+{b,s) > = inf{6 | ~g'+{b,s) > • (149) 



Before proceeding to show ( |149[ ), we present a lemma due to Sobel [81, Lemma 3, pg. 732], 



which is also presented in [80, Lemma 8-5, pg. 425]. 

Lemma 6 (Sobel, 1971). Let g,gi,g2, . . . be convex functions on an open convex subset X of 
M such that gn{x) g{x) as n oo and gn{x) < gn+i{x) for all n and x. Let g'n{x) and 
g'^{x) denote derivatives from the left and g'ni^) ^'^d g'^{x) denote derivatives from the right. 
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Then for all x G X: 



g' (x) < liminf (yf^ (x) < limsup (^^^(x) < g' 



x] 



(150) 



We now prove ( 149) by contradiction. Define: 



Vfc(s) := mi{b\ g'^{b,s) > -Ck{s)], and 
^oo,fe(s) := mf{b\g^{b,s)>-Ck{s)}. 

First, assume liminf 6^ ^(s) < boc,k{s), so there exists an xq G such that d < xq < boc,k{s), 
and a sequence {ni}^^^ ^ such that lim bni,k{s) = xq. Then we have: 



I— s>oo 

< limsup^^+(xo,s) 



(151) 



(152) 



Here, ( |151[ ) follows from lim b^^kis) = xq, and the fact that ^^+(-,3) is continuous from the 



right. Equation ( |152[ ) follows from Lemma [6| Yet, ^^(xo,s) > ~Ck{s) implies 6oo,fc(s) < xq, 
which is a contradiction. We conclude: 



liminf 6„ fc(s) > b 



(153) 



Next, assume limsup6„,fe(s) > &oo,fc(s) > d, and define: 



lim sup bn,k{s) + boo,k{s) 



xi : = 
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Then we have: 



< Uminf (a;i, s) 

71— >CXD 



(154) 
(155) 
(156) 
(157) 



Here, ( |154[ ) follows from the fact that g'^{-, s) is continuous from the right; ( |156[ ) follows from 
Lemma 



t - . and ( 155) and (157) follow from the fact (see, e.g., [77, pg. 228]) that for a proper 



convex function f on Ft, zi < x < Z2 implies: 



r{zi)<f'-{x)<r{x)<f-{z,) . 



liminf g'^(xi, s) > —Ck{s) implies that for every sequence {nA „ , we have: 

lim g'nAxus) > -Ck{s) , 



and, in turn: 



lim 6„,,fc(s) < xi 



Therefore, limsup 6„,fc(s) < xi, which is a contradiction. We conclude: 



limsup6„,A:(s) < &oo,fc(s) 



(158) 



Equations ([153]) and ( [T58] ) imply ( [1491 ). 



"One hypothesis of Lemma is that all functions are defined on an open convex subset of JR. While our functions goo(-, s) 
and {ffn(-, s)}„gjv are defined on [d,oo), we only apply Lemmap]at the points xo,xi G {d,oo). Thus, equations \152\ and 
\156) follow from the application of Lemma|6|to the restrictions orthe functions goc{-, s) and {g„{-, s)}„gjY to the domain of 
{d, oo). 
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We are now ready to prove parts (e) and (f) of Theorem [5] Define 

Zfc-l(s), if 6oo,fc(s) - Zk-i{s) <x < 6oo,fc-l(s) - Zk^l{s) , 

ke{0,l,...,K} 



Zoo{x,s) := < 



&oo,fc(s) - X, if 6oo,fc(s) - his) <X < 6oo,fc(s) - , 

e {o,i,...,/s:- 1} 

boo,K{s) - a;, if &oo,x(s) - 5max(s) < X < 6oo,k(s) - 
5max(s), if < X < 6oo,x(s) - 5max(s) 

Clearly, lim 6„,fc(s) = &oo,A:(s) implies lim = s), Vx G Vs G iS. Further- 

more, gniv, s) — )■ ^oo(l/, s) and 2;*(x, s) — )• ;z^(x, s) as n -> oo imply: 



lim + s)) = 5'oo(a; + 2:^(2;, s)), Vx G Vs G 5 



So for all X G M+ and s G 5, we have: 



(159) 



V^oo(x, S) 



lim K(x,s) 



lim min <^c{z, s) + gn{x + z, s) 

lim {c(z;(x,s),s) +gn(x + z*{x,s),s) } 
c(2:^(a;,s),s) +^oo(a; + 2^(x,s),s) 

min <! 0(2;, s) + ^oo(x + 



(160) 
(161) 

^ ^. . , (162) 

{c(z, s) + h{x + Z — d) 
+a ■ IE [Voo{x + z - d,S') I 5 = 5] 

Equation ( |160| ) follows from Theorem [1} and ( |161[ ) follows from ( |159[ ) and the continuity of 



c(-, s). Equation ( |162| ) follows from the same line of analysis as part (ii) of the induction step 
in the proof of Theorem |3} with goo{-,s), 6oo,fc(s), and z^{-,s) replacing gm{-,s), hm,k{s), and 
s), respectively. Thus, Voo(-, ■)' the limit of the finite horizon value functions, satisfies the 
a-DCOE ([T9]) and is also equal to the infinite horizon discounted expected cost-to-go resulting 
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from the stationary policy tt^ := {z^, z^, . . .). We conclude tt^, the natural extension of the 
finite horizon optimal policy, is optimal for the infinite horizon problem (see, for example, [44, 
Propositions 9.12 and 9.16]). □ 

B. Proof of Theorem |9| 

We follow the same line of analysis as the proof of Theorem [sj Let x G R\ and s G 5 be 

arbitrary. First, we show inductively that Vi(x, s) < V2(x, s) < . . . < \4(x, s) < s) < 

Base Case : n = 1 

1^1 (x,s) = min {c^z + /i(x + z — d)} 

ze.4''(x,s) 

{CjZ + /i(x + z - d) 
+a-iK[Vi(x + z-d,Si) I S2 = s] 

= V^2(X,S) , 

where the inequality follows from Vi(x, s) > 0, Vx, Vs. 

Induction Step: Assume Ki(x, s) < K,+i(x, s) for n = 1, 2, . . . , m — 1. We show it is true for 

n = m: 



Kn(x,s) = min 



CjZ + h{x + z — d) 

^e^-Cx-s) y +«.iE[Kn_i(x + z-d,S„_i) I S^ = s] 

f c'^z + /i(x + z - d) 
< min < 

ze^d(x,s) 1^ +a ■ iE [Kn(x + Z - d, S^) I S„+i = S] 

= ^+i(x, s) , 

where the inequality follows from the induction hypothesis and the homogeneity of the Markov 
process representing the channel condition. So, for every x G 1R\ and s G 5, {V„(x, s)}^^^ 2 
is a nondecreasing sequence. 

Next, consider a policy tt** transmitting d} packets to user 1 and cP packets to user 2 in every 
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slot, regardless of channel condition. Define: 

Cmax := (cLx> cLx)^ > whcrc := sup {c,, } < oo . (163) 

Then we have: 

K(x, s) < C'(x, s) < (c;_d + h{x)) < (c-_d + h{x)) Y^<oo, 

so s)}^^^ 2 is ^ bounded nondecreasing sequence, implying lim„_^oo s) exists and 

is finite, Vx G iR^,Vs G 5. 

Next, recall from Theorem [t] that Ki(x, s) is convex and supermodular in x, for all n and 
all s. Define Kx)(x, s) := lim„_>.oo s). Let s G 5 be arbitrary, but fixed. Foo(x, s) = 
sup„g^ s), so \4o(x, s) is convex in x as it is the pointwise supremum of the convex 
functions {Vn(x, s)} ^ ■ Furthermore, the pointwise limit of supermodular functions is su- 
permodular (see, e.g., [76, Lemma 2.6.1]), so Voo(x, s) is also supermodular in x. 

Define Goo : [d^, oo) x [rf^, oo) x 5 — )• 1R+ by 

Goo(y,s) := cIy + /i(y-d) + a-iE[Ko(y-d,S') I S = s] 

= c^y + /^(y-d) + «-iE[lim v;(y-d,S') I S = s] 

= <y + h{y-d) + a- lim E [K(y - d, S') I S = s] (164) 

= limG„(y,s), (165) 



where (164) follows from the homogeneity of the Markov process representing the channel 
condition and the Monotone Convergence Theorem. Furthermore, for each s E S, Goo(y, s) is 
convex and supermodular in y as it is the sum of an affine function of y, a convex separable 
function of y — d and a weighted sum of the convex supermodular functions V^o (y — d, s'). 
Additionally, lim Gooiy,s) > lim c^y = oo. Thus, for every s, at least one finite vector 
achieves the global minimum of Goo(y)S); i3oo(s) is a nonempty closed convex set; and boo(s), 
/j^(-,s), and f^{-,s) are well-defined. The structure of the optimal policy outlined in (b) then 
follows from the same line of analysis used to prove the the structure of the optimal policy in 
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the induction step of Theorem [8] 

Moreover, since for a fixed s E S and G [ci^, oo), 

/^(x^s) :=inf I argmin(G„ {y\ , s\ s')] \ = M {b' \ G'+ {b\ , s\ s') > O} , 

[y^e[d\oo) I J J 

the convergence of /^(a;^,s) to /^(a;^,s) follows from the same argument used to show ( |149[ ). 
The convergence of /^(a;^,s) to /^(x^,s) follows from a symmetric argument. 
For all s G 5 and x^ G [d^, oo), define: 

^'„(x\s):= min {G„(x\ s\ s^)} = s), s\ s^) , Vn G IV, 

and 

^oo(x\s):= min {Goo(a;\ x^ s\ s^)} = Goo s), s\ s^) . 

For fixed but arbitrary x^ and s, /^(x^,s) converges to /^(x^,s), and, by Dini's Theorem, 
G„(x^,-,s) converges to Goo(a;^,-,s) uniformly on a compact interval containing /^(x\s). 
Thus, \E'„(x^,s) converges pointwise to \E'oo(a;Ss). Moreover, for every s, {\l'n(x^, s)}„g_jv and 
\E'oo(2^^5 s) are all convex in x^ with the limit as x^ approaches infinity equal to infinity. Therefore, 
by the same argument used to show ( |149[ ), 6^(s) converges pointwise to bl^{s). 



For all s G 5 and x'^ G [cf, oo), define: 

^^(x^s) := G„(6^(s),x^s^s2) , Vn G IV, 
and 

^^(x^s) :=Goo(&L(s)>a:^s^s2) . 

For fixed but arbitrary x^ and s, &^(s) converges to &^(s), and, by Dini's Theorem, G„(-,x^,s) 
converges to Goo(-,a;^,s) uniformly on a compact interval around bl^{s). Thus, ^„(x^,s) con- 
verges pointwise to ^oo(a;^,s). Moreover, for every s, {^„(x^, s)}„ejv and ^oo(a;^,s) are all 
convex in x^ with the limit as x^ approaches infinity equal to infinity. Therefore, by the same 
argument used to show ( |149| ), &^(s) converges pointwise to &^(s), and we conclude boo(s) = 
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lim b„(s). □ 

n— >oo 

X. Appendix C - Infinite Horizon Average Expected Cost Proofs 



In this section, we prove Theorem 10 using the vanishing discount approach (see, e.g., [43]). 



The proof of Theorem [6] is nearly identical, and we note the few key differences. 



A. Proof of Theorem 10 



Substituting (27) and (29) into the a-DCOE (26) and rearranging yields 



{I- a)- moo.a + Woo,a(x, s) 

[y - x] + h (y - d) + a ■ iE [w^,^{y - d, S') | S = s] } , Vx G iR^, Vs G 5 . 



mm 

yeyi''{x,s) 



(166) 



The main idea of the vanishing discount approach is to take the limit as a goes to 1, and show 
that ([166]) converges to the ACOE ([30]). 



We start by presenting five conditions from the literature on the vanishing discount approach. 
Condition (G). p := inf inf < lim sup ^V)^i(jc,s) > < oo. 

Condition (W). (i) The state space IR\ x S is a locally compact space with countable base. 

(ii) The action space A''{x,s) is a nonempty compact subset of the state space x S, and 
the multifunction : (jc,s) A''{x,s) is upper semicontinuous; that is, (f>^^{F) is closed 
in IR\ X S for every closed set F C M^. 

(iii) The transition law is weakly continuous (see, e.g., [43, Appendix C]). 

(iv) The one-stage cost c{z,s) + h{x + z — d) is lower semicontinuous and nonnegative. 

Condition (B). sup Woo,a(-*^,'S) < oo for all x G 1R\ and s E S. 

Condition (B2). There is a measurable function R : ]R\ x 5 — t- iR+ such that R > Woo,a for all 
a G [0, 1), and: 

E[R{y-d,S') \ S = s\ < cx), V(jc,s) G iR+ x 5, Vj G A\x,s) . (167) 
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Condition (E). For every increasing sequence of discount factors {a(0}«=i,2,... approaching 1, 
the sequence {^00,0(0};=! 2 equicontinuous. 

We show below that our model satisfies these five conditions, but first we show how they lead 



to Theorem 10 Parts (b), (c), and (e) of Theorem 10 follow directly from the following theorem 
due to Schal [82, Theorem 3.8] and adapted to our notation. 

Theorem 11 (Schal, 1993). Suppose conditions (G), (W), and (B) hold. Then the minimum 

average cost p* = inf inf < limsup -hV^i{x,s) > = lim(l — a) ■ moo «• Moreover, there exists 

se5 



an optimal selector yl^ ") ■^"c/z that: 



P + w^^i[x,s) > mm < 

yeA''(.,s) I +E[w^,iiy-d,S') \ S = s] 



(168) 



yloA^^s) -X +h{ylo,iix,s) -d 



+ 1E 



WooA 



'y*^^,ix,s)-d,S') 



S = s 



where for every {x,s) G x S and any increasing sequence of discount factors {a(0}«=i,2,. 
approaching 1, 



Woo,iix,s) := liminf Woo,a(/)(x,s) • 

Z— >oo 



(169) 



Furthermore, for every {x,s) G x S and any increasing sequence of discount factors 
{a(0}/=i,2,... approaching 1, there exists a subsequence {ci{li)}i=ix... approaching 1 and a 
sequence {x{i)} 1=1^2,... approaching x such that: 

ylo,iix,s) = limj;^ )(x(i),s) . 



To get the opposite inequality from ( |168| ), we use a method from [83] and [84, Theorem 4.1] 
(which is presented in [43, Section 5.5]). Namely, for every x G R\, s E S, y E ^^(x, s), and 
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(/) from ( fT69l ), ([166]) implies: 



(1 - a{l)) ■ m^^a^i) + Woo,a{o(x,s) 

< cl[y - x] + h (y - d) + «(/) ■ iE K,«(o(y - d, S') | S = s] 



(170) 



Furthemiore, in combination with Conditions (B) and (E), the Arzela-Ascoli Theorem implies 
there exists a subsequence {a{li)}i=i^2,... of {a (l)} 1=1^2,... such that: 



Woo,i(x,s) = lim w;oo,a(/.)(x,s) ,Vx G 1?+, Vs G 5 . 



(171) 



Then, taking the limit of ( |170| ) as a goes to 1 along the sequence {a;(Zi)}i=i 2,..., ( [28] ), ( |171[ ), 
Condition (B2), and the Lebesgue Dominated Convergence Theorem imply: 

p* + w;oo,i(x,s) <c^[y-x] +h(y-d) + iE[woo,i(y-d,S') | S = s] , 

Vx G Rl, Vs G 5, Vy G ^''(x,s) , 



which implies: 

f c^[y-x] +h(y-d) 

p* + w^i{x,s) < min ^ \ , Wxe Rl, WseS . (172) 

ye^''(x,s) I +^ [^^^^ _ d, S') I S = s] 



Equations ( |168[ ) and ( |172[ ) yield the ACOE ( |30| ). Moreover, from ( |171| ) and the fact that convexity 
and supermodularity are preserved under pointwise limits, we conclude that for every s E S, 
Woo,i(x, s) is convex and supermodular in x. Then, by the same argument as the one used in 
Theorems [8] and [9[ there exists an optimal stationary policy with the same structure as statement 
(b) in Theorem |9] that minimizes the right hand side of the ACOE. 

Thus, it just remains to show our model satisfies the five conditions. We proceed in order, 
beginning with Condition (G). Consider again the policy Tr*" transmitting packets to user 1 
and cP packets to user 2 in every slot, regardless of channel condition. Let the initial vector of 
buffer levels Xq = (0, 0), and let the initial vector of channel conditions Sq be arbitrary. Then we 
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have: 



p := inf inf | limsup ^.V^Jiix, s) I < limsup ^^^'(xo, Sq) < <,^,d < oo 

se5 



where c^^^ is defined in ( |163[ ){^ 

The only nontrivial statement in Condition (W) is the weak continuity of the transition law. 
Let {xj}j=i 2,..., {Si}j=i^2,...5 and {yj}i=i,2,... be sequences approaching x, s, and y, respectively, 
and let F be a bounded, continuous function on iRi x S. We need to show: 



lim ]E [r (X', S') |X = x„ S = Si, Y = yj = iE [T (X', S') |X = x, S = s, Y = y] 



This is true, as 



lim IE [r (X', S') |X = x„ S = s„ Y = yj 

= lim VPr (S' = s' I S = s,) • T (y ■ - d,s') 

s'es 

= y [limPr(S' = s' I S = Si)l ■ [lim T (y, - d, s') 

= 5^Pr(S' = s'|S = s)-r(y-d,s') 

s'es 

= iE[r(X',S')|X = x,S = s,Y = y] . 

Next, we prove Conditions (B) and (B2). Let a E [0, 1) be arbitrary. For every s G 5, 
Voo,a(x, s) is convex in x, and 

lim \4o,a(x, s) > lim h{x — d) = oo , 

||x||— ^oo ||x||— >oo 

so there exists an x*(s) E such that: 



mill {l^oo,a(x,s)} = 1/oo,a(x*(s),s) 



'"*For the proof of Theorem we use Cmax defined in l |147[ l instead. 
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Define: 



s* := argmin{1/oo,a(x*(s),s)} 
se5 



so that 



Define also the stationary policy tt = (y, y, ...), where: y(x, s) := (^ij^ {x^ , s^) , y"^ {x"^ , s"^] 
for m G {1,2}, 



, and 



X 



if x"^ (s*) + ci'" < 



P 

7m _ 2 

Com 



P 

I _2 



X'" + 



CgTTl ' 



if x"^ < x"^*(s*) + 



The stationary policy tt calls for the scheduler to allocate at most ^ units of power for trans- 
mission to each user, and tries to bring receiver m's buffer towards x'"*(s*) + (before 
transmission), regardless of the random channel conditions p] For m G {1,2}, let r™(x'^,s'^) 
be the random number of time slots until receiver m's buffer level at the beginning of a 
slot reaches x™*(s*) under policy tt, starting from state (x'",s™). Define also rinax(x, s) : = 
max {t^{x^, s^), r^(x^, s^)}, and '■= min {t^{x^, s-^), r^(x^, s^)}. Note that if x"^ > x™'*(s*). 



then T™(x™,s" 



x—x"^ (s*) 



and the total discounted expected transmission and holding cost 



associated with receiver m for the first r™(x™, s™) slots is upper bounded by: 



T™'(x"',S™) 



a 



T™(x™,s'")-l _ m 



t=l 



(173) 



i=l 



"For the proof of Theorem ^ the policy tt calls for the scheduler to allocate the full P units of power for transmission to 
the single receiver when its buffer is below x*{s*) + d. The bounds are adjusted accordingly. 
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On the other hand, if x"" < x™*(s*), iE[r'"(x™, s™)] is finiteQ Therefore, by Wald's Lemma, 
the total discounted expected transmission and holding cost associated with receiver m for the 
first r™(x'^, s™) slots is upper bounded by: 



T™(X™,S'") 

t=l 



a 



+ /i"(a;™*(s*)) 



< iE[r™(a;'",s'")] ■ 



+ /i™(x™*(s*)) 



(174) 



So for m G {1, 2}, we define: 



Cax ■ t^"* + E h"" {x - t ■ d"") , if X'"* (S*) < X'" 

t=l 

]E[t"'{x"',s"')] ■ [f + /i™(x"*(s*))] , if X™ < x™'(s*) 

Next, let rswitch(x, s) be the random number of time slots until the state (x*(s*),s*) is reached 
at the beginning of a slot under policy tt, starting from state (x, s). We define a new policy tt 
that follows TT for rswitch(x, s) slots (a random stopping time), and then behaves optimally. Then 
we have: 



(175) 



where 



KiX, S ■ = K [X \ S ) + K (X , S 



+ iK[wch(x,s) -r^in(x,s)] ■ [cl,,,d + h\x'\s*)) +h^{x^\s*))] . (176) 



The third term in ( |176[ ) is an upper bound on the transmission and holding costs required to keep 
the vector of buffer levels at x*(s*) while waiting for the vector of channel condition realizations 
to reach s*. Since the vector of channel conditions is a finite-state ergodic Markov process, this 

"in order to guarantee _25[r'"(a;"', s™)] is finite, we actually need an additional assumption that Pr^ = d™^ < 1. 
However, this assumption is harmless, for if it is not true, the channel condition does not vary over time, a scenario outside of 
our scope of interest. 
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quantity is finite. Equation ( |175| ) implies: 



= 1/oo,a(x,s) - V;o,a(x*(s*),S*) 

< R{x, s) < oo . 

The important thing to note here is that the bounding function R(x, s) is independent of a, so 



Condition (B) holds. The function k(x, s) is also measurable and satisfies ( |167[ ), so Condition 
(B2) also holds. 

Finally, Condition (E) follows from the fact that for every / G {1, 2, . . .} and s E S, Woo,a(o(') s) 
is convex. Thus, by the finiteness of S and essentially the same argument used by Fernandez- 
Gaucherand, Marcus, and Arapostathis in [83, pp. 178-179], {woo,a(i)i-, 2 locally equi- 
Lipschitzian and equicontinuous. 
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